Can Two Very Different Attention Maps Mean the Same Thing?

Introduction

One of the strongest pieces of evidence that attention weights can be misleading as explanations comes from a simple observation: a model can sometimes produce the same answer even when its attention map changes dramatically. If two very different patterns of attention lead to essentially the same prediction, then the attention map cannot be a unique explanation of why that prediction occurred. This finding has become a central argument in the debate over whether attention visualisations should be interpreted as faithful accounts of model reasoning. Rather than revealing a single decisive path to an answer, attention maps may represent just one of several computational routes available to the model. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Same Answer illustration 1 In the broader discussion of why attention weights can mislead explanations, this evidence matters because explanations are expected to identify factors that genuinely drive a decision. If alternative attention patterns can be substituted without changing the outcome, the highlighted tokens or regions may be less essential than they appear. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Can Two Very Different Attention Maps Mean the Same Thing?

Research suggests that the answer is often yes.

A widely cited 2019 study by Sarthak Jain and Byron Wallace tested whether attention weights truly reflected the reasons behind model predictions. Among several experiments, they searched for alternative attention distributions that differed substantially from the model’s original attention pattern while preserving the same output. They found that such alternative distributions frequently existed. In other words, the model’s prediction remained stable even when attention was redirected to different parts of the input. [arXiv+2ACL Anthology]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

The significance of this result is easy to see through a thought experiment. Imagine a sentiment-analysis system classifying a review as positive. One attention map highlights words such as “excellent” and “wonderful”. A second, very different map focuses on other parts of the sentence, yet the model still predicts “positive” with nearly identical confidence. If both maps lead to the same result, it becomes difficult to argue that the first map uniquely explains the decision. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

This does not mean that attention is meaningless. It means that a single attention visualisation may be only one possible description of the computation rather than a faithful account of what was necessary for the prediction. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana…

Counterfactual Attention Experiments

The key evidence comes from counterfactual attention experiments.

In these studies, researchers deliberately alter attention weights while attempting to keep the model’s output unchanged. If attention genuinely captures the causal basis of a decision, then major changes in attention should produce major changes in predictions. However, researchers often found the opposite: attention could be modified substantially while predictions remained nearly identical. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2307 — one might hope that if attention weights are peaky, then coun…

The logic resembles a scientific intervention test:

Same Answer illustration 2

Observe the original attention pattern.
Replace it with a markedly different pattern.
Check whether the prediction changes.

When the prediction stays the same despite large changes in attention, the original attention map loses credibility as a complete explanation. The experiment reveals that multiple explanations, expressed as attention distributions, are compatible with the same behaviour. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Importantly, these experiments do not prove that attention is useless. They show that attention alone is insufficient as evidence of causation. A heat map may indicate where information is flowing, but it does not necessarily identify the features that the model truly depends on. [arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Why Multiple Computational Paths Exist

The existence of different attention maps leading to the same answer is less surprising when considering how modern neural networks operate.

Transformer models contain many layers and many attention heads. Information can travel through numerous routes before reaching the final prediction. Different internal pathways may encode similar information, creating a form of redundancy. As a result, changing one attention pattern does not always remove the information needed for the task because that information may already be represented elsewhere in the network. [arXiv]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…Published: May 2, 2020

Researchers studying attention flow have shown that information becomes increasingly mixed as it passes through layers. A token highlighted in one layer may already contain information gathered from many other tokens earlier in the computation. Consequently, several distinct attention configurations can produce representations that are functionally equivalent by the time the model generates its output. [arXiv+2ACL Anthology]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…Published: May 2, 2020

Another factor is that attention is only one component of the model. Feed-forward layers, residual connections, embeddings, and other mechanisms all contribute to the final prediction. Even if attention changes, these other components can preserve enough information for the answer to remain stable. [arXiv]arxiv.orgarXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider…Published: May 2, 2020

Same Answer illustration 3

What This Means for Model Explanations

The practical lesson is not that attention maps should be discarded. Rather, they should be interpreted with caution.

A useful explanation should help answer a counterfactual question: what would have changed the decision? If dramatically different attention patterns leave the decision unchanged, then attention alone cannot provide that answer. The map may show one way the model processed information, but not necessarily the factors that were indispensable for the outcome. [arXiv+2arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

This insight has influenced a broader shift in AI interpretability research. Instead of relying solely on raw attention visualisations, researchers increasingly compare attention with other evidence, such as gradient-based importance measures, feature ablation tests, causal interventions, and attention-flow analyses. The goal is to determine not merely where information appears to move, but which inputs genuinely affect the prediction. [arXiv+2arXiv]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

The debate remains active. Some researchers argue that attention can still be informative under carefully defined conditions and should not be dismissed entirely. Others maintain that the existence of radically different attention maps producing the same output fundamentally limits attention’s explanatory value. What both sides largely agree on is that a colourful attention heat map should not automatically be treated as a faithful explanation of model reasoning. [ACL Anthology+2arXiv]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana…

The Key Takeaway

The finding that different attention maps can produce the same answer is one of the clearest demonstrations that attention is not the same thing as explanation. Counterfactual experiments show that a model’s prediction can remain stable even when attention shifts dramatically across the input. This suggests that attention visualisations often describe one possible computational route rather than the unique reason for a decision. For anyone trying to understand artificial intelligence systems, the lesson is straightforward: attention maps can provide clues, but explanations require evidence that the highlighted information actually mattered to the outcome. [arXiv+2ACL Anthology]arxiv.orgarXiv Attention is not ExplanationAttention is not ExplanationFebruary 26, 2019…Published: February 26, 2019

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

Neural network Framed Art Print Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: neural network poster

Browse similar on eBay.co.uk

Example eBay listing

Neuron synapses, neural network, me Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: neural network poster

Browse similar on eBay.co.uk

Example eBay listing

Deep Neural Network Framed Art Prin Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: neural network poster

Browse similar on eBay.co.uk

Example eBay listing

Neural Network Watercolor Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: neural network poster

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Title: arXiv Attention is not Explanation
Link: https://arxiv.org/abs/1902.10186
Source snippet
Attention is not ExplanationFebruary 26, 2019...

Published: February 26, 2019
Source: arxiv.org
Link: https://arxiv.org/pdf/1902.10186
Source snippet
1902.10186v3 [cs.CL] 8 May 2019by S Jain · 2019 · Cited by 2458 — Under the assumption that attention weights are explanatory, such...
Source: arxiv.org
Title: arXiv Attention is not not Explanation
Link: https://arxiv.org/abs/1908.04626
Source snippet
We challenge many of the assumptions underlying this work.Read more...
Source: arxiv.org
Link: https://arxiv.org/abs/2005.00928
Source snippet
arXiv[2005.00928] Quantifying Attention Flow in TransformersMay 2, 2020 — by S Abnar · 2020 · Cited by 1611 — In this paper, we consider...

Published: May 2, 2020
Source: arxiv.org
Title: arXiv Attention cannot be an Explanation
Link: https://arxiv.org/abs/2201.11194
Source: arxiv.org
Link: https://arxiv.org/html/2601.04398v4
Source snippet
Interpreting Transformers Through Attention Head...Feb 26, 2026 — Initial transformer interpretability research treated attention weight...
Source: arxiv.org
Link: https://arxiv.org/abs/2401.05744
Source snippet
Counterfactual Reasoning for Path-Based Explainable...by Y Li · 2024 · Cited by 56 — Access Paper: View a PDF of the paper titled Attent...
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357.pdf
Source snippet
ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2307 — one might hope that if attention weights are peaky, then coun...
Source: aclanthology.org
Title: We challenge many of the assumptions underlying this work.Read more
Link: https://aclanthology.org/D19-1002/
Source snippet
ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1707 — A recent paper claims that 'Attention is not Explana...
Source: aclanthology.org
Title: 2020.acl main.385
Link: https://aclanthology.org/2020.acl-main.385/
Source snippet
2020. Quantifying Attention Flow in Transformers. In Proceedings of the 58th Annual Meeting of the Association for...Read more...
Source: aclanthology.org
Title: ACL Anthology Is Attention Explanation?
Link: https://aclanthology.org/2022.acl-long.269.pdf
Source snippet
An Introduction to the Debateby A Bibal · 2022 · Cited by 211 — (2020) theoretically show that at- tention weights in transformers can be...
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357/
Source snippet
Attention is not Explanationby S Jain · 2019 · Cited by 2461 — Our findings show that standard attention modules do not provide meaningfu...
Source: aclanthology.org
Link: https://aclanthology.org/2020.acl-main.385.pdf
Source snippet
Quantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1611 — In this paper, we consider the problem of quantifying this...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Attention
Source snippet
AttentionAttention is the concentration of awareness directed at some task or phenomenon while mostly excluding others. Focused attent...
Source: dictionary.cambridge.org
Link: https://dictionary.cambridge.org/dictionary/english/attention
Source snippet
English meaning - Cambridge Dictionary5 days ago — to watch, listen to, or think about something or someone carefully or with interest...
Source: github.com
Link: https://github.com/sarahwie/attention
Source snippet
Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...
Source: samiraabnar.github.io
Title: attention flow
Link: https://samiraabnar.github.io/articles/2020-04/attention_flow
Source snippet
Quantifying Attention Flow in Transformers5 Apr 2020 — I explain two simple but effective methods, called Attention Rollout and Attention...

Additional References

Source: merriam-webster.com
Link: https://www.merriam-webster.com/dictionary/attention
Source snippet
ATTENTION Definition & Meaning5 days ago — 1. a: the act or state of applying the mind to something Our attention was on the game. You s...
Source: openreview.net
Link: https://openreview.net/pdf?id=BJe-_CNKPH
Source snippet
ATTENTION INTERPRETABILITY ACROSS NLP TASKSby S Vashishth · Cited by 260 — We also explain why attention weights are not interpretable wh...
Source: scribd.com
Link: https://www.scribd.com/document/539572843/1902-10186
Source snippet
Attention Mechanisms Lack Explanatory Power | PDF2) It is possible to construct different attention distributions that yield equiva...
Source: medium.com
Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017
Source snippet
Attention is not not ExplanationAttention Distribution is not a Primitive. From a modeling perspective, detaching the attention scores ob...
Source: bibbase.org
Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019
Source snippet
Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...
Source: reddit.com
Link: https://www.reddit.com/r/MachineLearning/comments/1003d7w/discussion_is_attention_an_explanation/
Source snippet
[Discussion] is attention an explanation?: r/MachineLearningAttention maps can be a type of explanation. It tells you what the model was...
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Quantifying-Attention-Flow-in-Transformers-Abnar-Zuidema/76a9f336481b39515d6cea2920696f11fb686451
Source snippet
[PDF] Quantifying Attention Flow in TransformersThis paper proposes two methods for approximating the attention to input tokens given att...
Source: pure.uva.nl
Link: https://pure.uva.nl/ws/files/178487922/2020.acl-main.385.pdf
Source snippet
uva.nlUvA-DARE (Digital Academic Repository)In this paper, we consider the problem of quantifying this flow of infor- mation through self...
Source: github.com
Link: https://github.com/samiraabnar/attention_flow
Source: researchgate.net
Link: https://www.researchgate.net/publication/336999161_Attention_is_not_not_Explanation
Source snippet
We show that even when reliable [adversarial]({{ 'stress-tests/' | relative_url }}) distributions can be found, they...Read more...

Can Two Very Different Attention Maps Mean the Same Thing?

Introduction

Can Two Very Different Attention Maps Mean the Same Thing?

Counterfactual Attention Experiments

Why Multiple Computational Paths Exist

What This Means for Model Explanations

The Key Takeaway

Further Reading

Natural Language Processing with Transformers

The Alignment Problem

Deep Learning

Interpretable Machine Learning

Marketplace Samples

Neural network Framed Art Print Framed Wall Art Poster Canvas Print Picture

Neuron synapses, neural network, me Framed Wall Art Poster Canvas Print Picture

Deep Neural Network Framed Art Prin Framed Wall Art Poster Canvas Print Picture

Neural Network Watercolor Framed Wall Art Poster Canvas Print Picture

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2