Within Attention maps
Can Two Very Different Attention Maps Mean the Same Thing?
Research shows that substantially different attention patterns can sometimes leave a model's prediction unchanged.
On this page
- Counterfactual attention experiments
- Why multiple computational paths exist
- What this means for model explanations
Page outline Jump by section
Introduction
One of the strongest pieces of evidence that attention weights can be misleading as explanations comes from a simple observation: a model can sometimes produce the same answer even when its attention map changes dramatically. If two very different patterns of attention lead to essentially the same prediction, then the attention map cannot be a unique explanation of why that prediction occurred. This finding has become a central argument in the debate over whether attention visualisations should be interpreted as faithful accounts of model reasoning. Rather than revealing a single decisive path to an answer, attention maps may represent just one of several computational routes available to the model. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
In the broader discussion of why attention weights can mislead explanations, this evidence matters because explanations are expected to identify factors that genuinely drive a decision. If alternative attention patterns can be substituted without changing the outcome, the highlighted tokens or regions may be less essential than they appear. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
Can Two Very Different Attention Maps Mean the Same Thing?
Research suggests that the answer is often yes.
A widely cited 2019 study by Sarthak Jain and Byron Wallace tested whether attention weights truly reflected the reasons behind model predictions. Among several experiments, they searched for alternative attention distributions that differed substantially from the model’s original attention pattern while preserving the same output. They found that such alternative distributions frequently existed. In other words, the model’s prediction remained stable even when attention was redirected to different parts of the input. [arXiv+2ACL Anthology]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
The significance of this result is easy to see through a thought experiment. Imagine a sentiment-analysis system classifying a review as positive. One attention map highlights words such as “excellent” and “wonderful”. A second, very different map focuses on other parts of the sentence, yet the model still predicts “positive” with nearly identical confidence. If both maps lead to the same result, it becomes difficult to argue that the first map uniquely explains the decision. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
This does not mean that attention is meaningless. It means that a single attention visualisation may be only one possible description of the computation rather than a faithful account of what was necessary for the prediction. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana…
Counterfactual Attention Experiments
The key evidence comes from counterfactual attention experiments.
In these studies, researchers deliberately alter attention weights while attempting to keep the model’s output unchanged. If attention genuinely captures the causal basis of a decision, then major changes in attention should produce major changes in predictions. However, researchers often found the opposite: attention could be modified substantially while predictions remained nearly identical. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2307 — one might hope that if attention weights are peaky, then coun…
The logic resembles a scientific intervention test:
- Observe the original attention pattern.
- Replace it with a markedly different pattern.
- Check whether the prediction changes.
When the prediction stays the same despite large changes in attention, the original attention map loses credibility as a complete explanation. The experiment reveals that multiple explanations, expressed as attention distributions, are compatible with the same behaviour. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
Importantly, these experiments do not prove that attention is useless. They show that attention alone is insufficient as evidence of causation. A heat map may indicate where information is flowing, but it does not necessarily identify the features that the model truly depends on. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
Why Multiple Computational Paths Exist
The existence of different attention maps leading to the same answer is less surprising when considering how modern neural networks operate.
Transformer models contain many layers and many attention heads. Information can travel through numerous routes before reaching the final prediction. Different internal pathways may encode similar information, creating a form of redundancy. As a result, changing one attention pattern does not always remove the information needed for the task because that information may already be represented elsewhere in the network. [arXiv]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…
Researchers studying attention flow have shown that information becomes increasingly mixed as it passes through layers. A token highlighted in one layer may already contain information gathered from many other tokens earlier in the computation. Consequently, several distinct attention configurations can produce representations that are functionally equivalent by the time the model generates its output. [arXiv+2ACL Anthology]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…
Another factor is that attention is only one component of the model. Feed-forward layers, residual connections, embeddings, and other mechanisms all contribute to the final prediction. Even if attention changes, these other components can preserve enough information for the answer to remain stable. [arXiv]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…
What This Means for Model Explanations
The practical lesson is not that attention maps should be discarded. Rather, they should be interpreted with caution.
A useful explanation should help answer a counterfactual question: what would have changed the decision? If dramatically different attention patterns leave the decision unchanged, then attention alone cannot provide that answer. The map may show one way the model processed information, but not necessarily the factors that were indispensable for the outcome. [arXiv+2arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
This insight has influenced a broader shift in AI interpretability research. Instead of relying solely on raw attention visualisations, researchers increasingly compare attention with other evidence, such as gradient-based importance measures, feature ablation tests, causal interventions, and attention-flow analyses. The goal is to determine not merely where information appears to move, but which inputs genuinely affect the prediction. [arXiv+2arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
The debate remains active. Some researchers argue that attention can still be informative under carefully defined conditions and should not be dismissed entirely. Others maintain that the existence of radically different attention maps producing the same output fundamentally limits attention’s explanatory value. What both sides largely agree on is that a colourful attention heat map should not automatically be treated as a faithful explanation of model reasoning. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana…
The Key Takeaway
The finding that different attention maps can produce the same answer is one of the clearest demonstrations that attention is not the same thing as explanation. Counterfactual experiments show that a model’s prediction can remain stable even when attention shifts dramatically across the input. This suggests that attention visualisations often describe one possible computational route rather than the unique reason for a decision. For anyone trying to understand artificial intelligence systems, the lesson is straightforward: attention maps can provide clues, but explanations require evidence that the highlighted information actually mattered to the outcome. [arXiv+2ACL Anthology]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…
Amazon book picks
Further Reading
Books and field guides related to Can Two Very Different Attention Maps Mean the Same Thing?. Use these as the next step if you want deeper reading beyond the article.
Natural Language Processing with Transformers
Explains the attention-based models whose explanations are being questioned.
The Alignment Problem
Connects unreliable explanations to the broader challenge of understanding model behaviour.
Deep Learning
Rating: 3.5/5 from 6 Google Books ratings
Provides the background needed to understand multiple computational paths in neural networks.
Interpretable Machine Learning
Directly supports the page's question about whether explanations identify real causes.
Endnotes
-
Source: arxiv.org
Title: arXiv Attention is not Explanation
Link: https://arxiv.org/abs/1902.10186Source snippet
Attention is not ExplanationFebruary 26, 2019...
Published: February 26, 2019
-
Source: arxiv.org
Link: https://arxiv.org/pdf/1902.10186Source snippet
1902.10186v3 [cs.CL] 8 May 2019by S Jain · 2019 · Cited by 2458 — Under the assumption that attention weights are explanatory, such...
-
Source: arxiv.org
Title: arXiv Attention is not not Explanation
Link: https://arxiv.org/abs/1908.04626Source snippet
We challenge many of the assumptions underlying this work.Read more...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2005.00928Source snippet
arXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider...
Published: May 2, 2020
-
Source: arxiv.org
Title: arXiv Attention cannot be an Explanation
Link: https://arxiv.org/abs/2201.11194 -
Source: arxiv.org
Link: https://arxiv.org/html/2601.04398v4Source snippet
Interpreting Transformers Through Attention Head...Feb 26, 2026 — Initial transformer interpretability research treated attention weight...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2401.05744Source snippet
Counterfactual Reasoning for Path-Based Explainable...by Y Li · 2024 · Cited by 56 — Access Paper: View a PDF of the paper titled Attent...
-
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357.pdfSource snippet
ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2307 — one might hope that if attention weights are peaky, then coun...
-
Source: aclanthology.org
Title: We challenge many of the assumptions underlying this work.Read more
Link: https://aclanthology.org/D19-1002/Source snippet
ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana...
-
Source: aclanthology.org
Title: 2020.acl main.385
Link: https://aclanthology.org/2020.acl-main.385/Source snippet
2020. Quantifying Attention Flow in Transformers. In Proceedings of the 58th Annual Meeting of the Association for...Read more...
-
Source: aclanthology.org
Title: ACL Anthology Is Attention Explanation?
Link: https://aclanthology.org/2022.acl-long.269.pdfSource snippet
An Introduction to the Debateby A Bibal · 2022 · Cited by 211 — (2020) theoretically show that at- tention weights in transformers can be...
-
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357/Source snippet
Attention is not Explanationby S Jain · 2019 · Cited by 2461 — Our findings show that standard attention modules do not provide meaningfu...
-
Source: aclanthology.org
Link: https://aclanthology.org/2020.acl-main.385.pdfSource snippet
Quantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1611 — In this paper, we consider the problem of quantifying this...
-
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/AttentionSource snippet
AttentionAttention is the concentration of awareness directed at some task or phenomenon while mostly excluding others. Focused attent...
-
Source: dictionary.cambridge.org
Link: https://dictionary.cambridge.org/dictionary/english/attentionSource snippet
English meaning - Cambridge Dictionary5 days ago — to watch, listen to, or think about something or someone carefully or with interest...
-
Source: github.com
Link: https://github.com/sarahwie/attentionSource snippet
Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...
-
Source: samiraabnar.github.io
Title: attention flow
Link: https://samiraabnar.github.io/articles/2020-04/attention_flowSource snippet
Quantifying Attention Flow in Transformers5 Apr 2020 — I explain two simple but effective methods, called Attention Rollout and Attention...
Additional References
-
Source: merriam-webster.com
Link: https://www.merriam-webster.com/dictionary/attentionSource snippet
ATTENTION Definition & Meaning5 days ago — 1. a: the act or state of applying the mind to something Our attention was on the game. You s...
-
Source: openreview.net
Link: https://openreview.net/pdf?id=BJe-_CNKPHSource snippet
ATTENTION INTERPRETABILITY ACROSS NLP TASKSby S Vashishth · Cited by 260 — We also explain why attention weights are not interpretable wh...
-
Source: scribd.com
Link: https://www.scribd.com/document/539572843/1902-10186Source snippet
Attention Mechanisms Lack Explanatory Power | PDF2) It is possible to construct different attention distributions that yield equiva...
-
Source: medium.com
Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017Source snippet
Attention is not not ExplanationAttention Distribution is not a Primitive. From a modeling perspective, detaching the attention scores ob...
-
Source: bibbase.org
Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019Source snippet
Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...
-
Source: reddit.com
Link: https://www.reddit.com/r/MachineLearning/comments/1003d7w/discussion_is_attention_an_explanation/Source snippet
[Discussion] is attention an explanation?: r/MachineLearningAttention maps can be a type of explanation. It tells you what the model was...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Quantifying-Attention-Flow-in-Transformers-Abnar-Zuidema/76a9f336481b39515d6cea2920696f11fb686451Source snippet
[PDF] Quantifying Attention Flow in TransformersThis paper proposes two methods for approximating the attention to input tokens given att...
-
Source: pure.uva.nl
Link: https://pure.uva.nl/ws/files/178487922/2020.acl-main.385.pdfSource snippet
uva.nlUvA-DARE (Digital Academic Repository)In this paper, we consider the problem of quantifying this flow of infor- mation through self...
-
Source: github.com
Link: https://github.com/samiraabnar/attention_flow -
Source: researchgate.net
Link: https://www.researchgate.net/publication/336999161_Attention_is_not_not_ExplanationSource snippet
We show that even when reliable [adversarial]({{ 'stress-tests/' | relative_url }}) distributions can be found, they...Read more...
Topic Tree



