Within Attention maps

Can Two Very Different Attention Maps Mean the Same Thing?

Research shows that substantially different attention patterns can sometimes leave a model's prediction unchanged.

On this page

  • Counterfactual attention experiments
  • Why multiple computational paths exist
  • What this means for model explanations
Preview for Can Two Very Different Attention Maps Mean the Same Thing?

Introduction

One of the strongest pieces of evidence that attention weights can be misleading as explanations comes from a simple observation: a model can sometimes produce the same answer even when its attention map changes dramatically. If two very different patterns of attention lead to essentially the same prediction, then the attention map cannot be a unique explanation of why that prediction occurred. This finding has become a central argument in the debate over whether attention visualisations should be interpreted as faithful accounts of model reasoning. Rather than revealing a single decisive path to an answer, attention maps may represent just one of several computational routes available to the model. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Same Answer illustration 1 In the broader discussion of why attention weights can mislead explanations, this evidence matters because explanations are expected to identify factors that genuinely drive a decision. If alternative attention patterns can be substituted without changing the outcome, the highlighted tokens or regions may be less essential than they appear. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Can Two Very Different Attention Maps Mean the Same Thing?

Research suggests that the answer is often yes.

A widely cited 2019 study by Sarthak Jain and Byron Wallace tested whether attention weights truly reflected the reasons behind model predictions. Among several experiments, they searched for alternative attention distributions that differed substantially from the model’s original attention pattern while preserving the same output. They found that such alternative distributions frequently existed. In other words, the model’s prediction remained stable even when attention was redirected to different parts of the input. [arXiv+2ACL Anthology]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

The significance of this result is easy to see through a thought experiment. Imagine a sentiment-analysis system classifying a review as positive. One attention map highlights words such as “excellent” and “wonderful”. A second, very different map focuses on other parts of the sentence, yet the model still predicts “positive” with nearly identical confidence. If both maps lead to the same result, it becomes difficult to argue that the first map uniquely explains the decision. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

This does not mean that attention is meaningless. It means that a single attention visualisation may be only one possible description of the computation rather than a faithful account of what was necessary for the prediction. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana…

Counterfactual Attention Experiments

The key evidence comes from counterfactual attention experiments.

In these studies, researchers deliberately alter attention weights while attempting to keep the model’s output unchanged. If attention genuinely captures the causal basis of a decision, then major changes in attention should produce major changes in predictions. However, researchers often found the opposite: attention could be modified substantially while predictions remained nearly identical. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2307 — one might hope that if attention weights are peaky, then coun…

The logic resembles a scientific intervention test:

Same Answer illustration 2

  1. Observe the original attention pattern.
  2. Replace it with a markedly different pattern.
  3. Check whether the prediction changes.

When the prediction stays the same despite large changes in attention, the original attention map loses credibility as a complete explanation. The experiment reveals that multiple explanations, expressed as attention distributions, are compatible with the same behaviour. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Importantly, these experiments do not prove that attention is useless. They show that attention alone is insufficient as evidence of causation. A heat map may indicate where information is flowing, but it does not necessarily identify the features that the model truly depends on. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Why Multiple Computational Paths Exist

The existence of different attention maps leading to the same answer is less surprising when considering how modern neural networks operate.

Transformer models contain many layers and many attention heads. Information can travel through numerous routes before reaching the final prediction. Different internal pathways may encode similar information, creating a form of redundancy. As a result, changing one attention pattern does not always remove the information needed for the task because that information may already be represented elsewhere in the network. [arXiv]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…Published: May 2, 2020

Researchers studying attention flow have shown that information becomes increasingly mixed as it passes through layers. A token highlighted in one layer may already contain information gathered from many other tokens earlier in the computation. Consequently, several distinct attention configurations can produce representations that are functionally equivalent by the time the model generates its output. [arXiv+2ACL Anthology]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…Published: May 2, 2020

Another factor is that attention is only one component of the model. Feed-forward layers, residual connections, embeddings, and other mechanisms all contribute to the final prediction. Even if attention changes, these other components can preserve enough information for the answer to remain stable. [arXiv]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…Published: May 2, 2020

Same Answer illustration 3

What This Means for Model Explanations

The practical lesson is not that attention maps should be discarded. Rather, they should be interpreted with caution.

A useful explanation should help answer a counterfactual question: what would have changed the decision? If dramatically different attention patterns leave the decision unchanged, then attention alone cannot provide that answer. The map may show one way the model processed information, but not necessarily the factors that were indispensable for the outcome. [arXiv+2arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

This insight has influenced a broader shift in AI interpretability research. Instead of relying solely on raw attention visualisations, researchers increasingly compare attention with other evidence, such as gradient-based importance measures, feature ablation tests, causal interventions, and attention-flow analyses. The goal is to determine not merely where information appears to move, but which inputs genuinely affect the prediction. [arXiv+2arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

The debate remains active. Some researchers argue that attention can still be informative under carefully defined conditions and should not be dismissed entirely. Others maintain that the existence of radically different attention maps producing the same output fundamentally limits attention’s explanatory value. What both sides largely agree on is that a colourful attention heat map should not automatically be treated as a faithful explanation of model reasoning. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana…

The Key Takeaway

The finding that different attention maps can produce the same answer is one of the clearest demonstrations that attention is not the same thing as explanation. Counterfactual experiments show that a model’s prediction can remain stable even when attention shifts dramatically across the input. This suggests that attention visualisations often describe one possible computational route rather than the unique reason for a decision. For anyone trying to understand artificial intelligence systems, the lesson is straightforward: attention maps can provide clues, but explanations require evidence that the highlighted information actually mattered to the outcome. [arXiv+2ACL Anthology]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Amazon book picks

Further Reading

Books and field guides related to Can Two Very Different Attention Maps Mean the Same Thing?. Use these as the next step if you want deeper reading beyond the article.

BookCover for Deep Learning

Deep Learning

By Ian Goodfellow, Yoshua Bengio et al.

Rating: 3.5/5 from 6 Google Books ratings

Provides the background needed to understand multiple computational paths in neural networks.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Title: arXiv Attention is not Explanation
    Link: https://arxiv.org/abs/1902.10186
    Source snippet

    Attention is not ExplanationFebruary 26, 2019...

    Published: February 26, 2019

  2. Source: arxiv.org
    Link: https://arxiv.org/pdf/1902.10186
    Source snippet

    1902.10186v3 [cs.CL] 8 May 2019by S Jain · 2019 · Cited by 2458 — Under the assumption that attention weights are explanatory, such...

  3. Source: arxiv.org
    Title: arXiv Attention is not not Explanation
    Link: https://arxiv.org/abs/1908.04626
    Source snippet

    We challenge many of the assumptions underlying this work.Read more...

  4. Source: arxiv.org
    Link: https://arxiv.org/abs/2005.00928
    Source snippet

    arXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider...

    Published: May 2, 2020

  5. Source: arxiv.org
    Title: arXiv Attention cannot be an Explanation
    Link: https://arxiv.org/abs/2201.11194

  6. Source: arxiv.org
    Link: https://arxiv.org/html/2601.04398v4
    Source snippet

    Interpreting Transformers Through Attention Head...Feb 26, 2026 — Initial transformer interpretability research treated attention weight...

  7. Source: arxiv.org
    Link: https://arxiv.org/abs/2401.05744
    Source snippet

    Counterfactual Reasoning for Path-Based Explainable...by Y Li · 2024 · Cited by 56 — Access Paper: View a PDF of the paper titled Attent...

  8. Source: aclanthology.org
    Link: https://aclanthology.org/N19-1357.pdf
    Source snippet

    ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2307 — one might hope that if attention weights are peaky, then coun...

  9. Source: aclanthology.org
    Title: We challenge many of the assumptions underlying this work.Read more
    Link: https://aclanthology.org/D19-1002/
    Source snippet

    ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana...

  10. Source: aclanthology.org
    Title: 2020.acl main.385
    Link: https://aclanthology.org/2020.acl-main.385/
    Source snippet

    2020. Quantifying Attention Flow in Transformers. In Proceedings of the 58th Annual Meeting of the Association for...Read more...

  11. Source: aclanthology.org
    Title: ACL Anthology Is Attention Explanation?
    Link: https://aclanthology.org/2022.acl-long.269.pdf
    Source snippet

    An Introduction to the Debateby A Bibal · 2022 · Cited by 211 — (2020) theoretically show that at- tention weights in transformers can be...

  12. Source: aclanthology.org
    Link: https://aclanthology.org/N19-1357/
    Source snippet

    Attention is not Explanationby S Jain · 2019 · Cited by 2461 — Our findings show that standard attention modules do not provide meaningfu...

  13. Source: aclanthology.org
    Link: https://aclanthology.org/2020.acl-main.385.pdf
    Source snippet

    Quantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1611 — In this paper, we consider the problem of quantifying this...

  14. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Attention
    Source snippet

    AttentionAttention is the concentration of awareness directed at some task or phenomenon while mostly excluding others. Focused attent...

  15. Source: dictionary.cambridge.org
    Link: https://dictionary.cambridge.org/dictionary/english/attention
    Source snippet

    English meaning - Cambridge Dictionary5 days ago — to watch, listen to, or think about something or someone carefully or with interest...

  16. Source: github.com
    Link: https://github.com/sarahwie/attention
    Source snippet

    Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...

  17. Source: samiraabnar.github.io
    Title: attention flow
    Link: https://samiraabnar.github.io/articles/2020-04/attention_flow
    Source snippet

    Quantifying Attention Flow in Transformers5 Apr 2020 — I explain two simple but effective methods, called Attention Rollout and Attention...

Additional References

  1. Source: merriam-webster.com
    Link: https://www.merriam-webster.com/dictionary/attention
    Source snippet

    ATTENTION Definition & Meaning5 days ago — 1. a: the act or state of applying the mind to something Our attention was on the game. You s...

  2. Source: openreview.net
    Link: https://openreview.net/pdf?id=BJe-_CNKPH
    Source snippet

    ATTENTION INTERPRETABILITY ACROSS NLP TASKSby S Vashishth · Cited by 260 — We also explain why attention weights are not interpretable wh...

  3. Source: scribd.com
    Link: https://www.scribd.com/document/539572843/1902-10186
    Source snippet

    Attention Mechanisms Lack Explanatory Power | PDF2) It is possible to construct different attention distributions that yield equiva...

  4. Source: medium.com
    Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017
    Source snippet

    Attention is not not ExplanationAttention Distribution is not a Primitive. From a modeling perspective, detaching the attention scores ob...

  5. Source: bibbase.org
    Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019
    Source snippet

    Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...

  6. Source: reddit.com
    Link: https://www.reddit.com/r/MachineLearning/comments/1003d7w/discussion_is_attention_an_explanation/
    Source snippet

    [Discussion] is attention an explanation?: r/MachineLearningAttention maps can be a type of explanation. It tells you what the model was...

  7. Source: semanticscholar.org
    Link: https://www.semanticscholar.org/paper/Quantifying-Attention-Flow-in-Transformers-Abnar-Zuidema/76a9f336481b39515d6cea2920696f11fb686451
    Source snippet

    [PDF] Quantifying Attention Flow in TransformersThis paper proposes two methods for approximating the attention to input tokens given att...

  8. Source: pure.uva.nl
    Link: https://pure.uva.nl/ws/files/178487922/2020.acl-main.385.pdf
    Source snippet

    uva.nlUvA-DARE (Digital Academic Repository)In this paper, we consider the problem of quantifying this flow of infor- mation through self...

  9. Source: github.com
    Link: https://github.com/samiraabnar/attention_flow

  10. Source: researchgate.net
    Link: https://www.researchgate.net/publication/336999161_Attention_is_not_not_Explanation
    Source snippet

    We show that even when reliable [adversarial]({{ 'stress-tests/' | relative_url }}) distributions can be found, they...Read more...

Topic Tree

Follow this branch

Parent topic

Attention maps Can attention maps really explain AI?

Related pages 2