Within Transformers
Can attention maps really explain AI?
Attention maps can look intuitive, but they are not simple proof of which input caused a model's answer.
On this page
- What attention weights appear to show
- How information mixes across layers
- Safer ways to read model explanations
Page outline Jump by section
Introduction
People often assume that an attention map reveals exactly why an artificial intelligence system produced a particular answer. The appeal is obvious: if a model assigns high attention to certain words, phrases, or image regions, those elements appear to be the cause of the output. However, research over the past several years has shown that this interpretation is often too simple. Attention weights can provide useful clues about how information moves through a model, but they are not a direct record of causation or reasoning. Studies have found that models can sometimes produce the same prediction with very different attention patterns, and that attention scores often disagree with other measures of feature importance. As a result, many researchers now treat attention maps as one piece of evidence rather than a complete explanation. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…
Can attention maps really explain AI?
What attention weights appear to show
An attention map displays how strongly one token attends to other tokens during a particular computation. In a language model, a word might assign high attention to a previous noun, a punctuation mark, or another part of the sentence. When visualised as coloured heat maps, these patterns can look highly intuitive.
For example, if a model answering a question about a sentence places strong attention on the sentence’s key noun, it is tempting to conclude that the noun caused the answer. This interpretation became popular because attention weights resemble a form of human-readable importance score. Attention mechanisms were also originally described in terms of focusing on relevant information, reinforcing the idea that high attention equals high influence. [Wikipedia]WikipediaAttention (machine learningAttention (machine learning
The problem is that attention weights describe how information is combined at a specific step, not necessarily which input features ultimately determine the output. A token can receive relatively little attention yet still have a large influence through other pathways in the network. Conversely, a token may attract substantial attention without being decisive for the final prediction. [arXiv]arxiv.orgAttention is Not Only a Weight: Analyzing Transformers with…April 21, 2020 — by G Kobayashi · 2020 · Cited by 317 — This paper sh…
One influential 2019 study examined whether attention weights align with other measures of importance, such as gradients that estimate how much changing an input would affect the output. The researchers found that attention often showed weak correlation with these alternative measures. They also demonstrated that substantially different attention distributions could sometimes produce essentially the same prediction. Their conclusion was that standard attention weights should not automatically be treated as explanations. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…
How information mixes across layers
A deeper challenge arises from the architecture of Transformer models themselves. Modern Transformers contain many layers, and each layer repeatedly mixes information from different tokens. By the time a model reaches its final layers, the representation associated with a single token is no longer tied cleanly to one location in the original input.
Research on attention flow showed that information becomes increasingly blended as it passes through successive self-attention layers. A high attention score in a late layer therefore does not necessarily indicate direct reliance on the corresponding input token. Instead, that score may reflect a representation that already contains information gathered from many other positions earlier in the network. [ACL Anthology+2arXiv]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…
A useful analogy is a river system. Looking only at the final channel does not reveal where the water originally came from because many tributaries have merged upstream. Similarly, a late-layer attention map may highlight one token even though the underlying information originated from several different parts of the input.
This mixing effect means that raw attention weights can become increasingly misleading when interpreted as a direct answer to the question, “Which input mattered most?” The attention values describe local routing decisions within the network, while the influence of an input may depend on a long chain of interactions across multiple layers. [ACL Anthology]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…
Why different attention patterns can lead to the same answer
Another reason for caution is that neural networks often contain multiple pathways for reaching the same prediction. Researchers have constructed alternative or counterfactual attention distributions that differ substantially from the original attention pattern while leaving the model’s output nearly unchanged. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…
This finding matters because an explanation should ideally identify factors that are necessary for the decision. If radically different attention maps can produce the same result, then a single attention map may not uniquely explain why the model answered as it did.
The issue becomes even more complicated in multi-head attention. Different heads may specialise in different relationships, such as syntax, positional information, or entity tracking. A heat map from one head can therefore present only a partial view of the computation. Looking at a single head may create an appealing story while hiding important contributions from other heads and later processing stages. [Wikipedia]WikipediaAttention (machine learningAttention (machine learning
What researchers disagree about
Although many studies caution against equating attention with explanation, the debate is not settled. Some researchers argue that the question depends on what is meant by an explanation in the first place.
A notable response to the “attention is not explanation” critique argued that attention can still provide useful explanatory information under certain conditions. Rather than treating attention as a complete causal account, these researchers proposed evaluating attention using more rigorous tests and comparing it against baselines. Their position was not that attention perfectly explains model behaviour, but that it can sometimes offer meaningful insight when interpreted carefully. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana…
This debate has shifted the field away from simple yes-or-no claims. The more common view today is that attention maps may reveal something about a model’s internal processing, but their explanatory value depends on context, architecture, evaluation method, and the specific question being asked. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana…
Safer ways to read model explanations
Because raw attention weights can be misleading, researchers have developed methods that attempt to track influence more faithfully.
One approach is attention flow or attention rollout, which traces information through multiple layers instead of examining a single attention matrix in isolation. Studies have found that these methods correlate more closely with other importance measures than raw attention weights do. [arXiv]arxiv.orgarXiv Quantifying Attention Flow in TransformersarXiv Quantifying Attention Flow in Transformers
Another approach compares attention-based interpretations with independent techniques such as:
- Gradient-based attribution, which estimates how output changes when inputs change.
- Ablation tests, which remove or alter parts of the input and measure the effect on predictions.
- Shapley-value methods, which estimate each feature’s contribution by analysing many combinations of inputs.
- Counterfactual testing, which examines whether changing a supposedly important input actually changes the outcome. [arXiv+2arXiv]arxiv.orgarXiv Quantifying Attention Flow in TransformersarXiv Quantifying Attention Flow in Transformers
Using multiple explanation methods together is generally more reliable than relying on attention maps alone. When several independent techniques point to the same important features, confidence in the interpretation increases.
What this means for understanding AI
Attention maps remain valuable because they can reveal patterns that are difficult to see otherwise. They can help researchers inspect model behaviour, identify biases, and explore how information moves through a Transformer. However, they should not be treated as straightforward proof that a particular input caused a particular answer.
The central lesson is that attention measures information routing, not necessarily importance or causation. As information mixes across layers and multiple computational paths interact, the connection between an attention score and a model’s final decision becomes less direct. Attention visualisations can therefore be informative, but only when interpreted as partial evidence within a broader toolkit of explanation methods. [arXiv+3ACL Anthology+3arXiv]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…
Amazon book picks
Further Reading
Books and field guides related to Can attention maps really explain AI?. Use these as the next step if you want deeper reading beyond the article.
Natural Language Processing with Transformers
Covers attention-based transformer models in practical detail.
Artificial Intelligence
Helps readers understand limits of AI reasoning and interpretation.
Deep Learning
Rating: 3.5/5 from 6 Google Books ratings
Provides the neural-network background needed for understanding attention mechanisms.
Interpretable Machine Learning
Explains how to evaluate model explanations rather than trusting attractive visualisations.
eBay marketplace picks
Marketplace Samples
Example marketplace items related to this page. Use the search link to explore similar finds on eBay.
Endnotes
-
Source: arxiv.org
Title: arXiv Attention is not Explanation
Link: https://arxiv.org/abs/1902.10186Source snippet
[1902.10186] Attention is not Explanationby S Jain · 2019 · Cited by 2494 — Our findings show that standard attention modules do not prov...
-
Source: Wikipedia
Title: Attention ([machine learning]({{ ‘machine-learning/’ | relative_url }}))
Link: https://en.wikipedia.org/wiki/Attention_%28machine_learning%29 -
Source: arxiv.org
Link: https://arxiv.org/abs/2004.10102Source snippet
Attention is Not Only a Weight: Analyzing Transformers with...April 21, 2020 — by G Kobayashi · 2020 · Cited by 317 — This paper sh...
Published: April 21, 2020
-
Source: arxiv.org
Title: arXiv Quantifying Attention Flow in Transformers
Link: https://arxiv.org/abs/2005.00928 -
Source: arxiv.org
Title: arXiv Attention is not not Explanation
Link: https://arxiv.org/abs/1908.04626 -
Source: arxiv.org
Title: arXiv Attention Flows are Shapley Value Explanations
Link: https://arxiv.org/abs/2105.14652 -
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/AttentionSource snippet
AttentionAttention is the concentration of awareness directed at some task or phenomenon while mostly excluding others. Focused attent...
-
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357/Source snippet
ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov...
-
Source: aclanthology.org
Link: https://aclanthology.org/2020.acl-main.385/Source snippet
ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform...
-
Source: samiraabnar.github.io
Title: attention flow
Link: https://samiraabnar.github.io/articles/2020-04/attention_flowSource snippet
Quantifying Attention Flow in Transformers5 Apr 2020 — This makes attention weights unreliable as explanation probes to answer questions...
-
Source: aclanthology.org
Title: We challenge many of the assumptions underlying this work.Read more
Link: https://aclanthology.org/D19-1002/Source snippet
ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana...
-
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357.pdfSource snippet
Attention is not Explanationby S Jain · 2019 · Cited by 2494 — (i) Attention weights should correlate with feature importance measures (e...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/331396991_Attention_is_not_ExplanationSource snippet
materially affect the prediction, especially in deep, multi-layer...Read more...
-
Source: github.com
Link: https://github.com/sarahwie/attentionSource snippet
Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...
-
Source: liner.com
Title: attention is not explanation
Link: https://liner.com/review/attention-is-not-explanationSource snippet
[Quick Review]26 Feb 2019 — This work aims to assess the degree to which attention weights provide meaningful 'explanations' for predicti...
Additional References
-
Source: medium.com
Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017Source snippet
Attention is not not ExplanationExplanation can mean different things, and J&W are not clear on which interpretation they wish to disasso...
-
Source: bibbase.org
Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019Source snippet
Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...
-
Source: openreview.net
Link: https://openreview.net/pdf/77bf7201d7f76cb90754860c7a126c14ffb9c5ba.pdfSource snippet
GENERALIZED ATTENTION FLOWIn XAI, axioms are core principles that guide the evaluation of explanation methods, ensuring their reliability...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Attention-is-not-Explanation-Jain-Wallace/1e83c20def5c84efa6d4a0d80aa3159f55cb9c3fSource snippet
[PDF] Attention is not ExplanationThis work performs extensive experiments across a variety of NLP tasks to assess the degree to which at...
-
Source: elib.uni-stuttgart.de
Link: https://elib.uni-stuttgart.de/bitstreams/af346489-f4ec-4c08-b581-77bf6f6c370a/downloadSource snippet
Systematic Review of Explainable AI Methods for...by M Denizoglu — Although attention flow weights are more reliable than attention roll...
-
Source: openreview.net
Title: Attention Flows for General Transformersby N Metzger · Cited by 4 —
Link: https://openreview.net/forum?id=pcBJT4bgbpHSource snippet
Abstract: In this paper, we study the computation of how much an input token in a Transformer model influences its prediction.Read more...
-
Source: medium.com
Title: exploring visual attention in transformer models ab538c06083a
Link: https://medium.com/%40nivonl/exploring-visual-attention-in-transformer-models-ab538c06083aSource snippet
Exploring Visual Attention in Transformer ModelsThe attention rollout method developed in Abnar & Zuidema(2020) is for [language models]({{ 'language-models/' | relative_url }}) in...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/343298597_Quantifying_Attention_Flow_in_TransformersSource snippet
ss transformer layers to estimate token influence in the final prediction [9].Read more...
-
Source: youtube.com
Title: Self Attention Geometric Intuition | How to Visualize Self Attention
Link: https://www.youtube.com/watch?v=5ZgGuujZSbsSource snippet
Attention is not not explanation + Character Eyes: Seeing Language through Character-Level Taggers | - YouTube Attention is not not expla...
-
Source: bibbase.org
Title: jain wallace attentionisnotexplanation 2019
Link: https://bibbase.org/network/publication/jain-wallace-attentionisnotexplanation-2019Source snippet
Attention is not ExplanationAttention is not Explanation. Jain, S. & Wallace, B. C. In Proceedings of the 2019 Conference of the North Am...
Topic Tree



