Within Self attention
Why Attention Maps Can Mislead
Attention maps can show information flow, but they should not be treated as complete proof of why a model made a decision.
On this page
- What attention heat maps appear to show
- Why similar outputs can come from different patterns
- How to read attention as routing, not reasoning
Page outline Jump by section
Introduction
Attention heat maps are among the most popular visualisations used to explain how modern AI systems process language. In a Transformer model, attention weights show which tokens are connected to which other tokens, making it tempting to interpret a bright line or dark square as direct evidence of the model’s reasoning. That interpretation is often too strong. Attention maps can reveal useful patterns of information routing, especially when examining how self-attention links distant words or concepts, but they do not provide a complete account of why a model reached a particular output. Research over the past several years has shown that attention patterns can be informative while still failing to serve as faithful explanations of model decisions. [ACL Anthology+2ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…
What Attention Heat Maps Appear to Show
When a Transformer processes text, each token assigns attention weights to other tokens. Visualisations often display these weights as a heat map, where stronger attention appears darker, brighter, or thicker. The immediate impression is that the model is “looking at” the most important words.
In many cases, these diagrams genuinely reveal meaningful relationships. A pronoun may attend strongly to the noun it refers to, or a later sentence may attend to an earlier fact that helps determine its meaning. Such patterns help researchers understand how self-attention enables long-range connections that would be difficult for older sequence models. [IBM]ibm.comWhat is an attention mechanism?An attention mechanism is a machine learning technique that directs deep learning models to prioritize…
The problem arises when readers move from “the model attended to this token” to “this token caused the decision”. Attention weights indicate how information is combined at a particular stage of computation, but the final prediction depends on many additional components, including value vectors, residual connections, feed-forward networks, layer interactions, and subsequent transformations. A heat map captures only one part of that process. [ACL Anthology+2Neural Mechanics]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…
Why Similar Outputs Can Come From Different Patterns
One of the strongest challenges to treating attention maps as explanations comes from studies showing that very different attention distributions can sometimes produce nearly identical outputs.
A widely cited 2019 study by Sarthak Jain and Byron Wallace examined whether attention weights correspond to feature importance. Across multiple natural-language-processing tasks, they found only modest agreement between attention weights and other measures of importance. More strikingly, they were able to construct substantially different attention patterns that produced essentially the same predictions. Their conclusion was that standard attention weights should not automatically be treated as meaningful explanations. [ACL Anthology+2arXiv]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…
This finding matters because explanations are often expected to be faithful. If a heat map highlights one group of words, but a very different pattern of attention leads to the same answer, the original visualisation cannot be considered a complete explanation of the model’s behaviour. The highlighted tokens may be relevant, but they are not necessarily the unique or decisive reason for the prediction. [ACL Anthology+2Liner]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…
An analogy is a football team scoring a goal. A replay might show the final pass, but the goal also depended on earlier positioning, movement, and decisions across the entire play. Focusing on the final pass alone captures part of the story without capturing all of it.
Why Raw Attention Misses Important Computation
Another limitation is that attention maps usually display only a single layer or attention head. Modern language models contain many layers and many heads operating simultaneously.
Information does not simply travel from one token to another through a single attention connection. Each layer transforms representations before passing them onward. Residual connections allow information to bypass attention entirely, while multiple heads can focus on different relationships at the same time. By the time a prediction emerges, the model has combined information from numerous computational paths. [ACL Anthology+2Samira Abnar]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…
Researchers studying attention flow have argued that raw attention weights can be misleading because they ignore how information accumulates across layers. Methods such as attention rollout and attention flow were developed specifically to approximate how influence propagates through the network. These methods generally correlate better with other importance measures than raw attention visualisations do, suggesting that a single heat map often provides an incomplete picture. [ACL Anthology+2arXiv]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…
In practical terms, a token that receives little attention in one layer may still strongly influence the final output through indirect paths that a simple heat map does not reveal.
The Debate Is Not Completely Settled
The criticism of attention-based explanations has not ended the discussion. Shortly after the publication of Attention is not Explanation, other researchers argued that the issue depends partly on how “explanation” is defined.
A response paper titled Attention is not not Explanation contended that attention can still provide useful interpretive information when evaluated carefully and within the context of the entire model. Rather than asking whether attention supplies a perfect explanation, the authors suggested asking under what conditions attention offers reliable insight. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that 'Attention is not Explana…
This distinction is important. The debate is not between “attention explains everything” and “attention explains nothing”. Instead, researchers increasingly view attention as one interpretability signal among many. It can reveal patterns that are difficult to see otherwise, but it should be cross-checked with other methods when strong explanatory claims are being made. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that 'Attention is not Explana…
How to Read Attention as Routing, Not Reasoning
A useful way to interpret attention maps is to treat them as evidence of information routing rather than evidence of reasoning.
When a heat map shows that one token strongly attends to another, it suggests that information from the second token is being made available to the first token’s representation. That is a statement about communication within the network. It is not automatically a statement about causation, justification, or decision-making. [IBM]ibm.comWhat is an attention mechanism?An attention mechanism is a machine learning technique that directs deep learning models to prioritize…
For readers examining model behaviour, several guidelines help:
- View attention as a clue, not a proof. High attention indicates a connection, not necessarily the decisive factor behind an output.
- Consider the whole model. Attention interacts with many other components that affect predictions.
- Be cautious with single-layer visualisations. Influence often emerges through multi-layer pathways.
- Look for corroboration. Gradient-based attribution, ablation studies, and attention-flow analyses can provide complementary evidence. ApX Machine Learning+3ACL Anthology+3Neural Mechanics [aclanthology.org]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…
Under this interpretation, attention maps remain valuable. They help reveal how self-attention connects distant tokens and how information may travel through a Transformer. What they do not provide is a complete window into the model’s internal reasoning.
What Readers Should Take Away
Attention visualisations became popular because they offer an intuitive picture of how Transformers connect information across long distances. That picture is often useful, but it is only a partial view of the computation.
Research has repeatedly shown that attention weights do not always align with causal importance and that substantially different attention patterns can sometimes lead to the same prediction. At the same time, attention remains informative when treated as a description of information flow rather than a definitive explanation of model behaviour. [ACL Anthology+3ACL Anthology+3Liner]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…
The safest interpretation is therefore a limited one: attention maps can help show where information is routed inside a model, but they should not be mistaken for complete proof of why the model reached a particular conclusion. [ACL Anthology+2ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…
Amazon book picks
Further Reading
Books and field guides related to Why Attention Maps Can Mislead. Use these as the next step if you want deeper reading beyond the article.
Transformers for Machine Learning
Provides context for attention visualisations and limitations.
The Book of Why
Useful for understanding why correlation-like signals are not explanations.
Interpretable Machine Learning
Directly relevant to the limits of interpreting attention maps.
Endnotes
-
Source: ibm.com
Link: https://www.ibm.com/think/topics/attention-mechanismSource snippet
What is an attention mechanism?An attention mechanism is a [machine learning]({{ 'machine-learning/' | relative_url }}) technique that directs deep learning models to prioritize...
-
Source: arxiv.org
Link: https://arxiv.org/abs/1902.10186Source snippet
[1902.10186] Attention is not Explanationby S Jain · 2019 · Cited by 2525 — Our findings show that standard attention modules do not prov...
-
Source: liner.com
Title: attention is not explanation
Link: https://liner.com/review/attention-is-not-explanationSource snippet
[Quick Review]26 Feb 2019 — Regarding this NAACL 2019 paper, this review summarizes an evaluation of attention weights as explanations, f...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2005.00928Source snippet
[2005.00928] Quantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1658 — We propose two methods for approximating the a...
-
Source: arxiv.org
Title: We challenge many of the assumptions underlying this work.Read more
Link: https://arxiv.org/abs/1908.04626Source snippet
arXiv[1908.04626] Attention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that `Attention is not Ex...
-
Source: arxiv.org
Link: https://arxiv.org/html/2503.03321v1Source snippet
Visual Attention Sink in Large Multimodal Models5 Mar 2025 — LMMs have an extraordinary tendency to consistently allocate high attention...
-
Source: liner.com
Title: attention is not not explanation
Link: https://liner.com/ko/review/attention-is-not-not-explanationSource snippet
[논문 퀵 리뷰]Aug 13, 2019 — 따라서 본 연구는 attention mechanism이 RNN 모델 예측에 대한 설명으로 언제, 그리고 얼마나 신뢰할 수 있는지 판단하기 위한 대안 테스트를 개발하고...Read more...
-
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357/Source snippet
ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov...
-
Source: aclanthology.org
Title: 2020.acl main.385
Link: https://aclanthology.org/2020.acl-main.385/Source snippet
ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a...
-
Source: aclanthology.org
Title: We challenge many of the assumptions underlying this work.Read more
Link: https://aclanthology.org/D19-1002/Source snippet
ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that 'Attention is not Explana...
-
Source: neural-mechanics.baulab.info
Link: https://neural-mechanics.baulab.info/week7.htmlSource snippet
Neural MechanicsWeek 7: Attribution4.2 Attention Rollout. Method: Attention Rollout (Abnar & Zuidema, 2020). Idea: Propagate attention we...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/331396991_Attention_is_not_ExplanationSource snippet
Attention is not Explanation | Request PDFHigh attention weights do not guarantee that perturbing or removing a feature or neighbor will...
-
Source: samiraabnar.github.io
Title: attention flow
Link: https://samiraabnar.github.io/articles/2020-04/attention_flowSource snippet
Quantifying Attention Flow in TransformersApr 5, 2020 — I explain two simple but effective methods, called Attention Rollout and Attentio...
-
Source: apxml.com
Link: https://apxml.com/courses/how-to-build-a-large-language-model/chapter-23-analyzing-model-behavior/attention-map-visualizationSource snippet
Visualizing Transformer Attention MapsResearch has shown that attention weights might not always correlate strongly with other feature im...
-
Source: github.com
Link: https://github.com/sarahwie/attentionSource snippet
Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...
Additional References
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/336999161_Attention_is_not_not_ExplanationSource snippet
(PDF) Attention is not not ExplanationJain and Wallace [6] argued that attention weights should not automatically be equated with explana...
-
Source: medium.com
Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017Source snippet
Attention is not not ExplanationExistence does not Entail Exclusivity. On a theoretical level, attention scores are claimed to provide an...
-
Source: bibbase.org
Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019Source snippet
Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...
-
Source: magnimindacademy.com
Link: https://magnimindacademy.com/blog/the-mechanism-of-attention-in-large-language-models-a-comprehensive-guide/Source snippet
The Mechanism of Attention in Large Language ModelsIn this read, we will explore the detail of how the LLM attention mechanisms work, the...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Attention-is-not-Explanation-Jain-Wallace/1e83c20def5c84efa6d4a0d80aa3159f55cb9c3fSource snippet
[PDF] Attention is not ExplanationThis work performs extensive experiments across a variety of NLP tasks to assess the degree to which at...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Quantifying-Attention-Flow-in-Transformers-Abnar-Zuidema/76a9f336481b39515d6cea2920696f11fb686451Source snippet
[PDF] Quantifying Attention Flow in TransformersThis paper proposes two methods for approximating the attention to input tokens given att...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=OxCpWwDCDFQSource snippet
The Attention Mechanism in Large Language ModelsAttention mechanisms are absolutely fascinating they are what helped large language model...
-
Source: medium.com
Title: exploring visual attention in transformer models ab538c06083a
Link: https://medium.com/%40nivonl/exploring-visual-attention-in-transformer-models-ab538c06083aSource snippet
Exploring Visual Attention in Transformer ModelsThe attention rollout method developed in Abnar & Zuidema(2020) is for language models in...
-
Source: dsba.snu.ac.kr
Title: snu.ac.kr[Paper Review] Attention is not (not) Explanation
Link: https://dsba.snu.ac.kr/?kboard_content_redirect=1831Source snippet
"DSBANov 2, 2021 — 논문 제목: Attention is not explanation (Jain and Wallace; EMNLP, 2019, 481회 인용). 링크: [https://arxiv.org/abs/1902.10186.Read..."](https://arxiv.org/abs/1902.10186.Read...")...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/343298597_Quantifying_Attention_Flow_in_TransformersSource snippet
020), aim to trace how information flows through the model's attention layers.Read more...
Topic Tree


