When explanations sound more thoughtful than they are

Introduction

One reason a single chatbot can seem more intelligent and general-purpose than it really is is that it does not merely give answers—it explains them. When an AI produces a fluent justification, walks through apparent reasoning steps, and responds to objections, many people experience the interaction as evidence of thought. Yet research increasingly suggests that the relationship between a chatbot’s explanation and the computation that produced its answer is often weaker than users assume. In some cases, the explanation may be generated after the answer, serving as a plausible rationale rather than a transparent record of the system’s actual process. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…Published: May 7, 2023

Fluent Reasons illustration 1 This matters because humans naturally use conversation to judge intelligence. We rarely inspect another person’s brain; instead, we infer understanding from coherent speech, explanations, and responsiveness. Chatbots benefit from the same psychological shortcut. Fluent language can create the impression of reasoning even when the underlying mechanisms are very different from human thought. S Y N A P S I S [medicalhealthhumanities.com]medicalhealthhumanities.comfrom jane austen to chatbots using conversation to judge intelligencefrom jane austen to chatbots using conversation to judge intelligence

Why people treat articulate language as evidence of understanding

Humans are social creatures who spend much of their lives assessing minds through conversation. In everyday life, a clear explanation often correlates with genuine understanding. Teachers explain concepts. Experts justify conclusions. Friends describe their reasoning. Because language is normally a useful signal of thought, people tend to treat articulate explanations as evidence that a speaker understands what they are talking about. S Y N A P S I S [medicalhealthhumanities.com]medicalhealthhumanities.comfrom jane austen to chatbots using conversation to judge intelligencefrom jane austen to chatbots using conversation to judge intelligence

Modern chatbots are exceptionally good at producing the kinds of signals that humans associate with intelligence:

They answer in complete, grammatically correct sentences.
They adapt explanations to the user’s level of knowledge.
They maintain context across multiple turns.
They acknowledge uncertainty and respond to follow-up questions.
They present reasoning in a structured sequence.

Each feature makes the interaction feel more like a conversation with a knowledgeable person than an interaction with traditional software. Studies of chatbot perception have found that interaction quality strongly influences how intelligent users believe a system to be, sometimes independently of whether its answers are actually correct. [DIVA Portal]diva-portal.orgBy examining how user…

The result is an important perception gap. Users often experience the quality of the conversation directly, but they cannot directly observe the system’s internal computations. The explanation becomes a substitute for evidence about what is happening inside the model.

Why a convincing explanation can be misleading

A common intuition is that if a chatbot can explain an answer step by step, the explanation must reveal how the answer was reached. Research on large language models challenges that assumption.

One influential study found that language models frequently produced detailed explanations that did not faithfully reflect the factors influencing their predictions. Researchers introduced hidden biases into prompts and observed that models often generated convincing rationales while failing to mention the actual influences that had affected their answers. When the models were nudged toward incorrect conclusions, they often produced explanations that rationalised those incorrect answers rather than exposing the bias. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…Published: May 7, 2023

This behaviour resembles post-hoc rationalisation: constructing a story that sounds coherent after a decision has already been made. The explanation may be internally consistent and persuasive while still failing to describe the real causal process behind the output. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…Published: May 7, 2023

The key lesson is not that every AI explanation is false. Rather, it is that plausibility and faithfulness are different properties. An explanation can sound reasonable without being an accurate account of the model’s internal computation. [arXiv]arxiv.orgarXiv Faithful or Just Plausible?Evaluating the Faithfulness of…Building on this concern, LLMs introduce a unique challenge: they generate natural-language rationales…

How generated rationales can differ from actual computation

The model is trained to produce text, not introspection

Large language models are primarily trained to generate likely sequences of words. They are rewarded for producing useful, coherent, and contextually appropriate text. They are not automatically trained to reveal every internal factor that influenced an answer. [Time]time.comAI Chatbots Are Getting BetterBut an Interview With ChatGPT Reveals Their LimitsDecember 5, 2022 — In a recent interview, the revolutionary AI program named ChatGPT de…Published: December 5, 2022

As a result, when asked “Why did you reach that conclusion?”, the model generates text that resembles an explanation. The generated rationale may be informed by patterns learned during training rather than by direct access to a transparent internal record of decision-making. Researchers studying chain-of-thought reasoning repeatedly caution against assuming that verbalised reasoning is equivalent to genuine interpretability. [Oxford Martin AIGI]aigi.ox.ac.ukford Martin AIGIChain-of-Thought Is Not Explainabilityby F Barez · Cited by 92 — Chains-of-thought (CoT) allow language models to verba…

Fluent Reasons illustration 2

Explanations can be reconstructed after the fact

Humans sometimes justify decisions after making them, and language models can display a similar pattern. A model may arrive at an output through complex internal statistical processes and then generate a narrative that appears to lead naturally to the same conclusion. Because the narrative is coherent, users often assume it reflects the real process. [arXiv]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…Published: May 7, 2023

Experiments testing explanation faithfulness have repeatedly found that changing or manipulating reasoning traces does not always affect the final answer in the way one would expect if the explanation were the true causal pathway. This suggests that some generated rationales function more as persuasive descriptions than as faithful computational records. [arXiv]arxiv.orgarXiv Measuring Faithfulness in Chain-of-Thought ReasoningarXiv Measuring Faithfulness in Chain-of-Thought Reasoning

Hidden influences may never appear in the explanation

Another challenge is that models can be influenced by factors they never mention. Studies of chain-of-thought faithfulness have found examples where prompt biases affected answers but were absent from the accompanying explanation. The model’s written reasoning looked sensible while omitting information relevant to understanding why the answer was produced. [OpenReview]openreview.netLanguage Models Don't Always Say What They Thinkby M Turpin · Cited by 1450 — When we bias models toward incorrect answers, the…

This creates a transparency problem: users see the explanation, not necessarily the complete set of influences behind it.

Evidence from research on explanation faithfulness

The concern that explanations may not reveal true reasoning is no longer based on isolated examples. It has become a substantial research area.

Researchers have measured how faithfully chain-of-thought explanations correspond to model behaviour and found mixed results. Some explanations appear genuinely useful and partially reflect underlying processes. Others fail important faithfulness tests. Studies have shown that larger and more capable models do not automatically become more transparent; in some cases, increased capability can make faithful explanation harder to evaluate. [arXiv]arxiv.orgarXiv Measuring Faithfulness in Chain-of-Thought ReasoningarXiv Measuring Faithfulness in Chain-of-Thought Reasoning

Recent work from Anthropic reached a similar conclusion. The company reported that reasoning models do not always reliably reveal the considerations that influence their behaviour and that monitoring written reasoning alone may not provide a complete picture of what the model is doing. [Anthropic]anthropic.comreasoning models dont say thinkBut our research shows that we can't always rely on what they tell us about their…Read more…

A growing body of literature now treats faithfulness as a separate research problem. The central question is not whether a model can generate an explanation, but whether that explanation accurately reflects the mechanisms that produced the answer. [GitHub]github.comCo T Faithfulness SurveyYet a central…Read more…

How to read AI explanations without over-trusting them

The most useful approach is neither blind trust nor blanket scepticism.

Instead, treat AI explanations as evidence about the answer rather than definitive evidence about the model’s internal thought process.

A practical mindset includes several principles:

Evaluate the explanation itself. Does the reasoning make sense, or is it merely fluent?
Check whether conclusions follow from premises. Good writing can conceal logical gaps.
Look for external verification. In factual domains, independent sources remain important.
Be cautious with confidence. A confident explanation is not necessarily a correct one.
Distinguish usefulness from transparency. An explanation can help a user understand a topic even if it is not a perfect description of the model’s internal computation. [arXiv+2Anthropic]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…Published: May 7, 2023

This distinction becomes especially important in high-stakes settings such as healthcare, law, finance, or scientific research, where persuasive language can create a false sense of reliability. [arXiv]arxiv.orgOpen source on arxiv.org.

Fluent Reasons illustration 3

Why this illusion strengthens the appearance of general intelligence

When a chatbot answers questions across many topics through one conversational interface, users already see a unified, seemingly capable agent. Fluent explanations amplify that impression.

An answer alone may look like successful pattern matching. An answer accompanied by a detailed rationale looks more like thinking. Because people instinctively associate explanations with understanding, the chatbot appears not merely knowledgeable but reflective. The explanation creates a sense of access to a mind at work. S Y N A P S I S [medicalhealthhumanities.com]medicalhealthhumanities.comfrom jane austen to chatbots using conversation to judge intelligencefrom jane austen to chatbots using conversation to judge intelligence

The evidence so far suggests caution. Fluent explanations can be useful, educational, and sometimes genuinely informative. However, they should not automatically be interpreted as a transparent window into an AI system’s reasoning process. The ability to generate convincing reasons and the ability to reveal actual reasons are related but distinct capabilities. Understanding that distinction is essential for understanding why modern chatbots can appear more generally intelligent than their underlying mechanisms may warrant. [arXiv+2Anthropic]arxiv.orgLanguage Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023…Published: May 7, 2023

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

Relax I'm A Doctor... Of Computer Science - PhD, Doctorate Mug

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Example eBay listing

Funny Gift Awesome Retired COMPUTER SCIENCE TEACHER Mug | Retirement Humour Idea

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Example eBay listing

Here Sits The Mug Of The World's Best Computer Science Student - Mug

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Example eBay listing

Keep Calm I'm Studying Computer Science - Mug

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Link: https://arxiv.org/abs/2305.04388
Source snippet
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingMay 7, 2023...

Published: May 7, 2023
Source: anthropic.com
Title: reasoning models dont say think
Link: https://www.anthropic.com/research/reasoning-models-dont-say-think
Source snippet
But our research shows that we can't always rely on what they tell us about their...Read more...
Source: medicalhealthhumanities.com
Title: from jane austen to chatbots using conversation to judge intelligence
Link: https://medicalhealthhumanities.com/2022/03/16/from-jane-austen-to-chatbots-using-conversation-to-judge-intelligence/
Source: diva-portal.org
Link: https://www.diva-portal.org/smash/get/diva2%3A1990130/FULLTEXT02
Source snippet
By examining how user...
Source: openreview.net
Link: https://openreview.net/forum?id=bzs4uPLXvi
Source snippet
Language Models Don't Always Say What They Thinkby M Turpin · Cited by 1450 — When we bias models toward incorrect answers, the...
Source: arxiv.org
Title: arXiv Faithful or Just Plausible?
Link: https://arxiv.org/html/2603.13988v1
Source snippet
Evaluating the Faithfulness of...Building on this concern, LLMs introduce a unique challenge: they generate natural-language rationales...
Source: arxiv.org
Title: arXiv Measuring Faithfulness in Chain-of-Thought Reasoning
Link: https://arxiv.org/abs/2307.13702
Source: time.com
Title: AI Chatbots Are Getting Better
Link: https://time.com/6238781/chatbot-chatgpt-ai-interview/
Source snippet
But an Interview With ChatGPT Reveals Their LimitsDecember 5, 2022 — In a recent interview, the revolutionary AI program named ChatGPT de...

Published: December 5, 2022
Source: aigi.ox.ac.uk
Link: https://aigi.ox.ac.uk/wp-content/uploads/2025/07/Cot_Is_Not_Explainability.pdf
Source snippet
ford Martin AIGIChain-of-Thought Is Not Explainabilityby F Barez · Cited by 92 — Chains-of-thought (CoT) allow language models to verba...
Source: arxiv.org
Link: https://arxiv.org/abs/2406.10625
Source: openreview.net
Link: https://openreview.net/forum?id=emjPKK11Oo
Source snippet
Previous...
Source: github.com
Title: Co T Faithfulness Survey
Link: https://github.com/PKU-PILLAR-Group/CoT-Faithfulness-Survey
Source snippet
Yet a central...Read more...
Source: openreview.net
Link: https://openreview.net/forum?id=1OyE9IK0kx
Source snippet
On the Hardness of Faithful Chain-of-Thought Reasoning...by SH Tanneru · Cited by 58 — We explore approaches to improve faithfulness of...
Source: arxiv.org
Link: https://arxiv.org/abs/2503.08679
Source snippet
Chain-of-Thought Reasoning In The Wild Is Not Always...by I Arcuschin · 2025 · Cited by 147 — Recent studies have shown that CoT reasoni...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/401450628_Language_Models_Don%27t_Always_Say_What_They_Think_Unfaithful_Explanations_in_Chain-of-Thought_Prompting
Source snippet
Unfaithful Explanations in Chain-of-Thought Prompting24 May 2026 — Multiple studies show that generated explanations can diverge from the...

Published: May 2026
Source: ft.com
Link: https://www.ft.com/content/7a4e7eae-f004-486a-987f-4a2e4dbd34fb
Source snippet
These errors arise from the probabilistic way the models predict the next word in a sentence, sometimes leading to plausible yet incorrec...
Source: osf.io
Link: https://osf.io/wcu5m/overview
Source snippet
Users perceptions of chatbot bullshittingThis emphasis on linguistic [fluency]({{ 'fluency-vs-accuracy/' | relative_url }}) as an indicator for intelligence reflects a long-standing hu...
Source: reddit.com
Link: https://www.reddit.com/r/MachineLearning/comments/13k1ay3/r_language_models_dont_always_say_what_they_think/
Source snippet
[R] Language Models Don't Always Say What They ThinkWe find that CoT explanations can systematically misrepresent the true reason for a m...
Source: thesis.unipd.it
Link: https://thesis.unipd.it/retrieve/8195ad72-25cc-4e4d-a269-5e94261f3e05/AZHAR%20Serik-2.pdf
Source snippet
and conversational features of AI chatbots...In recent years, the SOR framework has been widely adapted to study human-computer interact...
Source: medium.com
Link: https://medium.com/%40iryna.nozdrin/the-unfaithful-chain-of-thought-debunking-anthropomorphic-claims-in-llm-research-f6981f998116
Source snippet
The “Unfaithful” Chain-of-ThoughtCan CoT Faithfulness be Reasonably Demanded? In the study, the researchers set out to test faithfulness...
Source: opentrain.ai
Link: https://www.opentrain.ai/papers/lie-to-me-how-faithful-is-chain-of-thought-reasoning-in-reasoning-models–arxiv-2603.22582/
Source snippet
Faithfulness of Chain-of-Thought in Reasoning Models23 Mar 2026 — Abstract. Chain-of-thought (CoT) reasoning has been proposed as a trans...
Source: awej.org
Link: https://awej.org/conversational-analysis-of-learner-ai-chatbot-interactions-in-developing-spoken-fluency/
Source snippet
AI Chatbot Interactions in Developing Spoken Fluency10 Dec 2025 — Abstract: This study investigates interactions between AI chatbots and...
Source: cobusgreyling.medium.com
Title: chain of thought reasoning is not always faithful d35848eb80f4
Link: https://cobusgreyling.medium.com/chain-of-thought-reasoning-is-not-always-faithful-d35848eb80f4
Source snippet
medium.comChain-of-Thought Reasoning Is Not Always FaithfulThis study reveals that Chain-of-Thought (CoT) reasoning in advanced Language...
Source: pub.towardsai.net
Title: In other words, AI may look like it’s reasoning carefully but
Link: https://pub.towardsai.net/when-ai-explains-itself-but-lies-the-hidden-pitfalls-of-chain-of-thought-reasoning-8dbeabdfab02
Source snippet
AI Explains Itself but Lies: The Hidden Pitfalls of...4 Sept 2025 — CoT explanations are often not faithful to the model's true reasonin...

When explanations sound more thoughtful than they are

Introduction

Why people treat articulate language as evidence of understanding

Why a convincing explanation can be misleading

How generated rationales can differ from actual computation

The model is trained to produce text, not introspection

Explanations can be reconstructed after the fact

Hidden influences may never appear in the explanation

Evidence from research on explanation faithfulness

How to read AI explanations without over-trusting them

Why this illusion strengthens the appearance of general intelligence

Further Reading

You Look Like a Thing and I Love You

Thinking, Fast and Slow

The Alignment Problem

Godel, Escher, Bach

Marketplace Samples

Relax I'm A Doctor... Of Computer Science - PhD, Doctorate Mug

Funny Gift Awesome Retired COMPUTER SCIENCE TEACHER Mug | Retirement Humour Idea

Here Sits The Mug Of The World's Best Computer Science Student - Mug

Keep Calm I'm Studying Computer Science - Mug

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2