Within Self attention

Why Attention Maps Can Mislead

Attention maps can show information flow, but they should not be treated as complete proof of why a model made a decision.

On this page

  • What attention heat maps appear to show
  • Why similar outputs can come from different patterns
  • How to read attention as routing, not reasoning
Preview for Why Attention Maps Can Mislead

Introduction

Attention heat maps are among the most popular visualisations used to explain how modern AI systems process language. In a Transformer model, attention weights show which tokens are connected to which other tokens, making it tempting to interpret a bright line or dark square as direct evidence of the model’s reasoning. That interpretation is often too strong. Attention maps can reveal useful patterns of information routing, especially when examining how self-attention links distant words or concepts, but they do not provide a complete account of why a model reached a particular output. Research over the past several years has shown that attention patterns can be informative while still failing to serve as faithful explanations of model decisions. [ACL Anthology+2ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…

Attention Maps illustration 1

What Attention Heat Maps Appear to Show

When a Transformer processes text, each token assigns attention weights to other tokens. Visualisations often display these weights as a heat map, where stronger attention appears darker, brighter, or thicker. The immediate impression is that the model is “looking at” the most important words.

In many cases, these diagrams genuinely reveal meaningful relationships. A pronoun may attend strongly to the noun it refers to, or a later sentence may attend to an earlier fact that helps determine its meaning. Such patterns help researchers understand how self-attention enables long-range connections that would be difficult for older sequence models. [IBM]ibm.comWhat is an attention mechanism?An attention mechanism is a machine learning technique that directs deep learning models to prioritize…

The problem arises when readers move from “the model attended to this token” to “this token caused the decision”. Attention weights indicate how information is combined at a particular stage of computation, but the final prediction depends on many additional components, including value vectors, residual connections, feed-forward networks, layer interactions, and subsequent transformations. A heat map captures only one part of that process. [ACL Anthology+2Neural Mechanics]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…

Why Similar Outputs Can Come From Different Patterns

One of the strongest challenges to treating attention maps as explanations comes from studies showing that very different attention distributions can sometimes produce nearly identical outputs.

A widely cited 2019 study by Sarthak Jain and Byron Wallace examined whether attention weights correspond to feature importance. Across multiple natural-language-processing tasks, they found only modest agreement between attention weights and other measures of importance. More strikingly, they were able to construct substantially different attention patterns that produced essentially the same predictions. Their conclusion was that standard attention weights should not automatically be treated as meaningful explanations. [ACL Anthology+2arXiv]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…

This finding matters because explanations are often expected to be faithful. If a heat map highlights one group of words, but a very different pattern of attention leads to the same answer, the original visualisation cannot be considered a complete explanation of the model’s behaviour. The highlighted tokens may be relevant, but they are not necessarily the unique or decisive reason for the prediction. [ACL Anthology+2Liner]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…

An analogy is a football team scoring a goal. A replay might show the final pass, but the goal also depended on earlier positioning, movement, and decisions across the entire play. Focusing on the final pass alone captures part of the story without capturing all of it.

Why Raw Attention Misses Important Computation

Another limitation is that attention maps usually display only a single layer or attention head. Modern language models contain many layers and many heads operating simultaneously.

Information does not simply travel from one token to another through a single attention connection. Each layer transforms representations before passing them onward. Residual connections allow information to bypass attention entirely, while multiple heads can focus on different relationships at the same time. By the time a prediction emerges, the model has combined information from numerous computational paths. [ACL Anthology+2Samira Abnar]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…

Researchers studying attention flow have argued that raw attention weights can be misleading because they ignore how information accumulates across layers. Methods such as attention rollout and attention flow were developed specifically to approximate how influence propagates through the network. These methods generally correlate better with other importance measures than raw attention visualisations do, suggesting that a single heat map often provides an incomplete picture. [ACL Anthology+2arXiv]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…

In practical terms, a token that receives little attention in one layer may still strongly influence the final output through indirect paths that a simple heat map does not reveal.

Attention Maps illustration 2

The Debate Is Not Completely Settled

The criticism of attention-based explanations has not ended the discussion. Shortly after the publication of Attention is not Explanation, other researchers argued that the issue depends partly on how “explanation” is defined.

A response paper titled Attention is not not Explanation contended that attention can still provide useful interpretive information when evaluated carefully and within the context of the entire model. Rather than asking whether attention supplies a perfect explanation, the authors suggested asking under what conditions attention offers reliable insight. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that 'Attention is not Explana…

This distinction is important. The debate is not between “attention explains everything” and “attention explains nothing”. Instead, researchers increasingly view attention as one interpretability signal among many. It can reveal patterns that are difficult to see otherwise, but it should be cross-checked with other methods when strong explanatory claims are being made. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that 'Attention is not Explana…

Attention Maps illustration 3

How to Read Attention as Routing, Not Reasoning

A useful way to interpret attention maps is to treat them as evidence of information routing rather than evidence of reasoning.

When a heat map shows that one token strongly attends to another, it suggests that information from the second token is being made available to the first token’s representation. That is a statement about communication within the network. It is not automatically a statement about causation, justification, or decision-making. [IBM]ibm.comWhat is an attention mechanism?An attention mechanism is a machine learning technique that directs deep learning models to prioritize…

For readers examining model behaviour, several guidelines help:

  • View attention as a clue, not a proof. High attention indicates a connection, not necessarily the decisive factor behind an output.
  • Consider the whole model. Attention interacts with many other components that affect predictions.
  • Be cautious with single-layer visualisations. Influence often emerges through multi-layer pathways.
  • Look for corroboration. Gradient-based attribution, ablation studies, and attention-flow analyses can provide complementary evidence. ApX Machine Learning+3ACL Anthology+3Neural Mechanics [aclanthology.org]aclanthology.org2020.acl main.385ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a…

Under this interpretation, attention maps remain valuable. They help reveal how self-attention connects distant tokens and how information may travel through a Transformer. What they do not provide is a complete window into the model’s internal reasoning.

What Readers Should Take Away

Attention visualisations became popular because they offer an intuitive picture of how Transformers connect information across long distances. That picture is often useful, but it is only a partial view of the computation.

Research has repeatedly shown that attention weights do not always align with causal importance and that substantially different attention patterns can sometimes lead to the same prediction. At the same time, attention remains informative when treated as a description of information flow rather than a definitive explanation of model behaviour. [ACL Anthology+3ACL Anthology+3Liner]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…

The safest interpretation is therefore a limited one: attention maps can help show where information is routed inside a model, but they should not be mistaken for complete proof of why the model reached a particular conclusion. [ACL Anthology+2ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov…

Amazon book picks

Further Reading

Books and field guides related to Why Attention Maps Can Mislead. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: ibm.com
    Link: https://www.ibm.com/think/topics/attention-mechanism
    Source snippet

    What is an attention mechanism?An attention mechanism is a [machine learning]({{ 'machine-learning/' | relative_url }}) technique that directs deep learning models to prioritize...

  2. Source: arxiv.org
    Link: https://arxiv.org/abs/1902.10186
    Source snippet

    [1902.10186] Attention is not Explanationby S Jain · 2019 · Cited by 2525 — Our findings show that standard attention modules do not prov...

  3. Source: liner.com
    Title: attention is not explanation
    Link: https://liner.com/review/attention-is-not-explanation
    Source snippet

    [Quick Review]26 Feb 2019 — Regarding this NAACL 2019 paper, this review summarizes an evaluation of attention weights as explanations, f...

  4. Source: arxiv.org
    Link: https://arxiv.org/abs/2005.00928
    Source snippet

    [2005.00928] Quantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1658 — We propose two methods for approximating the a...

  5. Source: arxiv.org
    Title: We challenge many of the assumptions underlying this work.Read more
    Link: https://arxiv.org/abs/1908.04626
    Source snippet

    arXiv[1908.04626] Attention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that `Attention is not Ex...

  6. Source: arxiv.org
    Link: https://arxiv.org/html/2503.03321v1
    Source snippet

    Visual Attention Sink in Large Multimodal Models5 Mar 2025 — LMMs have an extraordinary tendency to consistently allocate high attention...

  7. Source: liner.com
    Title: attention is not not explanation
    Link: https://liner.com/ko/review/attention-is-not-not-explanation
    Source snippet

    [논문 퀵 리뷰]Aug 13, 2019 — 따라서 본 연구는 attention mechanism이 RNN 모델 예측에 대한 설명으로 언제, 그리고 얼마나 신뢰할 수 있는지 판단하기 위한 대안 테스트를 개발하고...Read more...

  8. Source: aclanthology.org
    Link: https://aclanthology.org/N19-1357/
    Source snippet

    ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2348 — Our findings show that standard attention modules do not prov...

  9. Source: aclanthology.org
    Title: 2020.acl main.385
    Link: https://aclanthology.org/2020.acl-main.385/
    Source snippet

    ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1718 — We propose two methods for approximating the a...

  10. Source: aclanthology.org
    Title: We challenge many of the assumptions underlying this work.Read more
    Link: https://aclanthology.org/D19-1002/
    Source snippet

    ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1699 — A recent paper claims that 'Attention is not Explana...

  11. Source: neural-mechanics.baulab.info
    Link: https://neural-mechanics.baulab.info/week7.html
    Source snippet

    Neural MechanicsWeek 7: Attribution4.2 Attention Rollout. Method: Attention Rollout (Abnar & Zuidema, 2020). Idea: Propagate attention we...

  12. Source: researchgate.net
    Link: https://www.researchgate.net/publication/331396991_Attention_is_not_Explanation
    Source snippet

    Attention is not Explanation | Request PDFHigh attention weights do not guarantee that perturbing or removing a feature or neighbor will...

  13. Source: samiraabnar.github.io
    Title: attention flow
    Link: https://samiraabnar.github.io/articles/2020-04/attention_flow
    Source snippet

    Quantifying Attention Flow in TransformersApr 5, 2020 — I explain two simple but effective methods, called Attention Rollout and Attentio...

  14. Source: apxml.com
    Link: https://apxml.com/courses/how-to-build-a-large-language-model/chapter-23-analyzing-model-behavior/attention-map-visualization
    Source snippet

    Visualizing Transformer Attention MapsResearch has shown that attention weights might not always correlate strongly with other feature im...

  15. Source: github.com
    Link: https://github.com/sarahwie/attention
    Source snippet

    Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...

Additional References

  1. Source: researchgate.net
    Link: https://www.researchgate.net/publication/336999161_Attention_is_not_not_Explanation
    Source snippet

    (PDF) Attention is not not ExplanationJain and Wallace [6] argued that attention weights should not automatically be equated with explana...

  2. Source: medium.com
    Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017
    Source snippet

    Attention is not not ExplanationExistence does not Entail Exclusivity. On a theoretical level, attention scores are claimed to provide an...

  3. Source: bibbase.org
    Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019
    Source snippet

    Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...

  4. Source: magnimindacademy.com
    Link: https://magnimindacademy.com/blog/the-mechanism-of-attention-in-large-language-models-a-comprehensive-guide/
    Source snippet

    The Mechanism of Attention in Large Language ModelsIn this read, we will explore the detail of how the LLM attention mechanisms work, the...

  5. Source: semanticscholar.org
    Link: https://www.semanticscholar.org/paper/Attention-is-not-Explanation-Jain-Wallace/1e83c20def5c84efa6d4a0d80aa3159f55cb9c3f
    Source snippet

    [PDF] Attention is not ExplanationThis work performs extensive experiments across a variety of NLP tasks to assess the degree to which at...

  6. Source: semanticscholar.org
    Link: https://www.semanticscholar.org/paper/Quantifying-Attention-Flow-in-Transformers-Abnar-Zuidema/76a9f336481b39515d6cea2920696f11fb686451
    Source snippet

    [PDF] Quantifying Attention Flow in TransformersThis paper proposes two methods for approximating the attention to input tokens given att...

  7. Source: youtube.com
    Link: https://www.youtube.com/watch?v=OxCpWwDCDFQ
    Source snippet

    The Attention Mechanism in Large Language ModelsAttention mechanisms are absolutely fascinating they are what helped large language model...

  8. Source: medium.com
    Title: exploring visual attention in transformer models ab538c06083a
    Link: https://medium.com/%40nivonl/exploring-visual-attention-in-transformer-models-ab538c06083a
    Source snippet

    Exploring Visual Attention in Transformer ModelsThe attention rollout method developed in Abnar & Zuidema(2020) is for language models in...

  9. Source: dsba.snu.ac.kr
    Title: snu.ac.kr[Paper Review] Attention is not (not) Explanation
    Link: https://dsba.snu.ac.kr/?kboard_content_redirect=1831
    Source snippet

    "DSBANov 2, 2021 — 논문 제목: Attention is not explanation (Jain and Wallace; EMNLP, 2019, 481회 인용). 링크: [https://arxiv.org/abs/1902.10186.Read..."](https://arxiv.org/abs/1902.10186.Read...")...

  10. Source: researchgate.net
    Link: https://www.researchgate.net/publication/343298597_Quantifying_Attention_Flow_in_Transformers
    Source snippet

    020), aim to trace how information flows through the model's attention layers.Read more...

Topic Tree

Follow this branch

Parent topic

Self attention How does attention find the right context?

Related pages 2