Within Chatbox illusion
When explanations sound more thoughtful than they are
Polished explanations can feel like a window into thought, but they may be generated after the answer rather than revealing the real process.
On this page
- Why people treat articulate language as evidence of understanding
- How generated rationales can differ from actual computation
- How to read AI explanations without over trusting them
Page outline Jump by section
Introduction
One reason a single chatbot can seem more intelligent and general-purpose than it really is is that it does not merely give answers—it explains them. When an AI produces a fluent justification, walks through apparent reasoning steps, and responds to objections, many people experience the interaction as evidence of thought. Yet research increasingly suggests that the relationship between a chatbot’s explanation and the computation that produced its answer is often weaker than users assume. In some cases, the explanation may be generated after the answer, serving as a plausible rationale rather than a transparent record of the system’s actual process. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…
This matters because humans naturally use conversation to judge intelligence. We rarely inspect another person’s brain; instead, we infer understanding from coherent speech, explanations, and responsiveness. Chatbots benefit from the same psychological shortcut. Fluent language can create the impression of reasoning even when the underlying mechanisms are very different from human thought. S Y N A P S I S [medicalhealthhumanities.com]medicalhealthhumanities.comfrom jane austen to chatbots using conversation to judge intelligencefrom jane austen to chatbots using conversation to judge intelligence
Why people treat articulate language as evidence of understanding
Humans are social creatures who spend much of their lives assessing minds through conversation. In everyday life, a clear explanation often correlates with genuine understanding. Teachers explain concepts. Experts justify conclusions. Friends describe their reasoning. Because language is normally a useful signal of thought, people tend to treat articulate explanations as evidence that a speaker understands what they are talking about. S Y N A P S I S [medicalhealthhumanities.com]medicalhealthhumanities.comfrom jane austen to chatbots using conversation to judge intelligencefrom jane austen to chatbots using conversation to judge intelligence
Modern chatbots are exceptionally good at producing the kinds of signals that humans associate with intelligence:
- They answer in complete, grammatically correct sentences.
- They adapt explanations to the user’s level of knowledge.
- They maintain context across multiple turns.
- They acknowledge uncertainty and respond to follow-up questions.
- They present reasoning in a structured sequence.
Each feature makes the interaction feel more like a conversation with a knowledgeable person than an interaction with traditional software. Studies of chatbot perception have found that interaction quality strongly influences how intelligent users believe a system to be, sometimes independently of whether its answers are actually correct. [DIVA Portal]diva-portal.orgBy examining how user…
The result is an important perception gap. Users often experience the quality of the conversation directly, but they cannot directly observe the system’s internal computations. The explanation becomes a substitute for evidence about what is happening inside the model.
Why a convincing explanation can be misleading
A common intuition is that if a chatbot can explain an answer step by step, the explanation must reveal how the answer was reached. Research on large language models challenges that assumption.
One influential study found that language models frequently produced detailed explanations that did not faithfully reflect the factors influencing their predictions. Researchers introduced hidden biases into prompts and observed that models often generated convincing rationales while failing to mention the actual influences that had affected their answers. When the models were nudged toward incorrect conclusions, they often produced explanations that rationalised those incorrect answers rather than exposing the bias. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…
This behaviour resembles post-hoc rationalisation: constructing a story that sounds coherent after a decision has already been made. The explanation may be internally consistent and persuasive while still failing to describe the real causal process behind the output. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…
The key lesson is not that every AI explanation is false. Rather, it is that plausibility and faithfulness are different properties. An explanation can sound reasonable without being an accurate account of the model’s internal computation. [arXiv]arxiv.orgarXiv Faithful or Just Plausible?Evaluating the Faithfulness of…Building on this concern, LLMs introduce a unique challenge: they generate natural-language rationales…
How generated rationales can differ from actual computation
The model is trained to produce text, not introspection
Large language models are primarily trained to generate likely sequences of words. They are rewarded for producing useful, coherent, and contextually appropriate text. They are not automatically trained to reveal every internal factor that influenced an answer. [Time]time.comAI Chatbots Are Getting BetterBut an Interview With ChatGPT Reveals Their LimitsDecember 5, 2022 — In a recent interview, the revolutionary AI program named ChatGPT de…
As a result, when asked “Why did you reach that conclusion?”, the model generates text that resembles an explanation. The generated rationale may be informed by patterns learned during training rather than by direct access to a transparent internal record of decision-making. Researchers studying chain-of-thought reasoning repeatedly caution against assuming that verbalised reasoning is equivalent to genuine interpretability. [Oxford Martin AIGI]aigi.ox.ac.ukford Martin AIGIChain-of-Thought Is Not Explainabilityby F Barez · Cited by 92 — Chains-of-thought (CoT) allow language models to verba…
Explanations can be reconstructed after the fact
Humans sometimes justify decisions after making them, and language models can display a similar pattern. A model may arrive at an output through complex internal statistical processes and then generate a narrative that appears to lead naturally to the same conclusion. Because the narrative is coherent, users often assume it reflects the real process. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…
Experiments testing explanation faithfulness have repeatedly found that changing or manipulating reasoning traces does not always affect the final answer in the way one would expect if the explanation were the true causal pathway. This suggests that some generated rationales function more as persuasive descriptions than as faithful computational records. [arXiv]arxiv.orgarXiv Measuring Faithfulness in Chain-of-Thought ReasoningarXiv Measuring Faithfulness in Chain-of-Thought Reasoning
Hidden influences may never appear in the explanation
Another challenge is that models can be influenced by factors they never mention. Studies of chain-of-thought faithfulness have found examples where prompt biases affected answers but were absent from the accompanying explanation. The model’s written reasoning looked sensible while omitting information relevant to understanding why the answer was produced. [OpenReview]openreview.netLanguage Models Don't Always Say What They Thinkby M Turpin · Cited by 1450 — When we bias models toward incorrect answers, the…
This creates a transparency problem: users see the explanation, not necessarily the complete set of influences behind it.
Evidence from research on explanation faithfulness
The concern that explanations may not reveal true reasoning is no longer based on isolated examples. It has become a substantial research area.
Researchers have measured how faithfully chain-of-thought explanations correspond to model behaviour and found mixed results. Some explanations appear genuinely useful and partially reflect underlying processes. Others fail important faithfulness tests. Studies have shown that larger and more capable models do not automatically become more transparent; in some cases, increased capability can make faithful explanation harder to evaluate. [arXiv]arxiv.orgarXiv Measuring Faithfulness in Chain-of-Thought ReasoningarXiv Measuring Faithfulness in Chain-of-Thought Reasoning
Recent work from Anthropic reached a similar conclusion. The company reported that reasoning models do not always reliably reveal the considerations that influence their behaviour and that monitoring written reasoning alone may not provide a complete picture of what the model is doing. [Anthropic]anthropic.comreasoning models dont say thinkBut our research shows that we can't always rely on what they tell us about their…Read more…
A growing body of literature now treats faithfulness as a separate research problem. The central question is not whether a model can generate an explanation, but whether that explanation accurately reflects the mechanisms that produced the answer. [GitHub]github.comCo T Faithfulness SurveyYet a central…Read more…
How to read AI explanations without over-trusting them
The most useful approach is neither blind trust nor blanket scepticism.
Instead, treat AI explanations as evidence about the answer rather than definitive evidence about the model’s internal thought process.
A practical mindset includes several principles:
- Evaluate the explanation itself. Does the reasoning make sense, or is it merely fluent?
- Check whether conclusions follow from premises. Good writing can conceal logical gaps.
- Look for external verification. In factual domains, independent sources remain important.
- Be cautious with confidence. A confident explanation is not necessarily a correct one.
- Distinguish usefulness from transparency. An explanation can help a user understand a topic even if it is not a perfect description of the model’s internal computation. [arXiv+2Anthropic]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…
This distinction becomes especially important in high-stakes settings such as healthcare, law, finance, or scientific research, where persuasive language can create a false sense of reliability. [arXiv]arxiv.orgOpen source on arxiv.org.
Why this illusion strengthens the appearance of general intelligence
When a chatbot answers questions across many topics through one conversational interface, users already see a unified, seemingly capable agent. Fluent explanations amplify that impression.
An answer alone may look like successful pattern matching. An answer accompanied by a detailed rationale looks more like thinking. Because people instinctively associate explanations with understanding, the chatbot appears not merely knowledgeable but reflective. The explanation creates a sense of access to a mind at work. S Y N A P S I S [medicalhealthhumanities.com]medicalhealthhumanities.comfrom jane austen to chatbots using conversation to judge intelligencefrom jane austen to chatbots using conversation to judge intelligence
The evidence so far suggests caution. Fluent explanations can be useful, educational, and sometimes genuinely informative. However, they should not automatically be interpreted as a transparent window into an AI system’s reasoning process. The ability to generate convincing reasons and the ability to reveal actual reasons are related but distinct capabilities. Understanding that distinction is essential for understanding why modern chatbots can appear more generally intelligent than their underlying mechanisms may warrant. [arXiv+2Anthropic]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…
Amazon book picks
Further Reading
Books and field guides related to When explanations sound more thoughtful than they are. Use these as the next step if you want deeper reading beyond the article.
You Look Like a Thing and I Love You
Shows how convincing explanations can emerge from systems with limited understanding.
Thinking, Fast and Slow
Explains why people mistake fluent explanations for genuine understanding.
Godel, Escher, Bach
Explores relationships between symbols, reasoning, and apparent understanding.
Endnotes
-
Source: arxiv.org
Link: https://arxiv.org/abs/2305.04388Source snippet
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023...
Published: May 7, 2023
-
Source: anthropic.com
Title: reasoning models dont say think
Link: https://www.anthropic.com/research/reasoning-models-dont-say-thinkSource snippet
But our research shows that we can't always rely on what they tell us about their...Read more...
-
Source: medicalhealthhumanities.com
Title: from jane austen to chatbots using conversation to judge intelligence
Link: https://medicalhealthhumanities.com/2022/03/16/from-jane-austen-to-chatbots-using-conversation-to-judge-intelligence/ -
Source: diva-portal.org
Link: https://www.diva-portal.org/smash/get/diva2%3A1990130/FULLTEXT02Source snippet
By examining how user...
-
Source: openreview.net
Link: https://openreview.net/forum?id=bzs4uPLXviSource snippet
Language Models Don't Always Say What They Thinkby M Turpin · Cited by 1450 — When we bias models toward incorrect answers, the...
-
Source: arxiv.org
Title: arXiv Faithful or Just Plausible?
Link: https://arxiv.org/html/2603.13988v1Source snippet
Evaluating the Faithfulness of...Building on this concern, LLMs introduce a unique challenge: they generate natural-language rationales...
-
Source: arxiv.org
Title: arXiv Measuring Faithfulness in Chain-of-Thought Reasoning
Link: https://arxiv.org/abs/2307.13702 -
Source: time.com
Title: AI Chatbots Are Getting Better
Link: https://time.com/6238781/chatbot-chatgpt-ai-interview/Source snippet
But an Interview With ChatGPT Reveals Their LimitsDecember 5, 2022 — In a recent interview, the revolutionary AI program named ChatGPT de...
Published: December 5, 2022
-
Source: aigi.ox.ac.uk
Link: https://aigi.ox.ac.uk/wp-content/uploads/2025/07/Cot_Is_Not_Explainability.pdfSource snippet
ford Martin AIGIChain-of-Thought Is Not Explainabilityby F Barez · Cited by 92 — Chains-of-thought (CoT) allow language models to verba...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2406.10625 -
Source: openreview.net
Link: https://openreview.net/forum?id=emjPKK11OoSource snippet
Previous...
-
Source: github.com
Title: Co T Faithfulness Survey
Link: https://github.com/PKU-PILLAR-Group/CoT-Faithfulness-SurveySource snippet
Yet a central...Read more...
-
Source: openreview.net
Link: https://openreview.net/forum?id=1OyE9IK0kxSource snippet
On the Hardness of Faithful Chain-of-Thought Reasoning...by SH Tanneru · Cited by 58 — We explore approaches to improve faithfulness of...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2503.08679Source snippet
Chain-of-Thought Reasoning In The Wild Is Not Always...by I Arcuschin · 2025 · Cited by 147 — Recent studies have shown that CoT reasoni...
Additional References
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/401450628_Language_Models_Don%27t_Always_Say_What_They_Think_Unfaithful_Explanations_in_Chain-of-Thought_PromptingSource snippet
Unfaithful Explanations in Chain-of-Thought Prompting24 May 2026 — Multiple studies show that generated explanations can diverge from the...
Published: May 2026
-
Source: ft.com
Link: https://www.ft.com/content/7a4e7eae-f004-486a-987f-4a2e4dbd34fbSource snippet
These errors arise from the probabilistic way the models predict the next word in a sentence, sometimes leading to plausible yet incorrec...
-
Source: osf.io
Link: https://osf.io/wcu5m/overviewSource snippet
Users perceptions of chatbot bullshittingThis emphasis on linguistic [fluency]({{ 'fluency-vs-accuracy/' | relative_url }}) as an indicator for intelligence reflects a long-standing hu...
-
Source: reddit.com
Link: https://www.reddit.com/r/MachineLearning/comments/13k1ay3/r_language_models_dont_always_say_what_they_think/Source snippet
[R] Language Models Don't Always Say What They ThinkWe find that CoT explanations can systematically misrepresent the true reason for a m...
-
Source: thesis.unipd.it
Link: https://thesis.unipd.it/retrieve/8195ad72-25cc-4e4d-a269-5e94261f3e05/AZHAR%20Serik-2.pdfSource snippet
and conversational features of AI chatbots...In recent years, the SOR framework has been widely adapted to study human-computer interact...
-
Source: medium.com
Link: https://medium.com/%40iryna.nozdrin/the-unfaithful-chain-of-thought-debunking-anthropomorphic-claims-in-llm-research-f6981f998116Source snippet
The “Unfaithful” Chain-of-ThoughtCan CoT Faithfulness be Reasonably Demanded? In the study, the researchers set out to test faithfulness...
-
Source: opentrain.ai
Link: https://www.opentrain.ai/papers/lie-to-me-how-faithful-is-chain-of-thought-reasoning-in-reasoning-models–arxiv-2603.22582/Source snippet
Faithfulness of Chain-of-Thought in Reasoning Models23 Mar 2026 — Abstract. Chain-of-thought (CoT) reasoning has been proposed as a trans...
-
Source: awej.org
Link: https://awej.org/conversational-analysis-of-learner-ai-chatbot-interactions-in-developing-spoken-fluency/Source snippet
AI Chatbot Interactions in Developing Spoken Fluency10 Dec 2025 — Abstract: This study investigates interactions between AI chatbots and...
-
Source: cobusgreyling.medium.com
Title: chain of thought reasoning is not always faithful d35848eb80f4
Link: https://cobusgreyling.medium.com/chain-of-thought-reasoning-is-not-always-faithful-d35848eb80f4Source snippet
medium.comChain-of-Thought Reasoning Is Not Always FaithfulThis study reveals that Chain-of-Thought (CoT) reasoning in advanced Language...
-
Source: pub.towardsai.net
Title: In other words, AI may look like it’s reasoning carefully but
Link: https://pub.towardsai.net/when-ai-explains-itself-but-lies-the-hidden-pitfalls-of-chain-of-thought-reasoning-8dbeabdfab02Source snippet
AI Explains Itself but Lies: The Hidden Pitfalls of...4 Sept 2025 — CoT explanations are often not faithful to the model's true reasonin...
Topic Tree



