Can attention maps really explain AI?

Introduction

People often assume that an attention map reveals exactly why an artificial intelligence system produced a particular answer. The appeal is obvious: if a model assigns high attention to certain words, phrases, or image regions, those elements appear to be the cause of the output. However, research over the past several years has shown that this interpretation is often too simple. Attention weights can provide useful clues about how information moves through a model, but they are not a direct record of causation or reasoning. Studies have found that models can sometimes produce the same prediction with very different attention patterns, and that attention scores often disagree with other measures of feature importance. As a result, many researchers now treat attention maps as one piece of evidence rather than a complete explanation. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…

Attention maps illustration 1

Can attention maps really explain AI?

What attention weights appear to show

An attention map displays how strongly one token attends to other tokens during a particular computation. In a language model, a word might assign high attention to a previous noun, a punctuation mark, or another part of the sentence. When visualised as coloured heat maps, these patterns can look highly intuitive.

For example, if a model answering a question about a sentence places strong attention on the sentence’s key noun, it is tempting to conclude that the noun caused the answer. This interpretation became popular because attention weights resemble a form of human-readable importance score. Attention mechanisms were also originally described in terms of focusing on relevant information, reinforcing the idea that high attention equals high influence. [Wikipedia]WikipediaAttention (machine learningAttention (machine learning

The problem is that attention weights describe how information is combined at a specific step, not necessarily which input features ultimately determine the output. A token can receive relatively little attention yet still have a large influence through other pathways in the network. Conversely, a token may attract substantial attention without being decisive for the final prediction. [arXiv]arxiv.orgAttention is Not Only a Weight: Analyzing Transformers with…April 21, 2020 — by G Kobayashi · 2020 · Cited by 317 — This paper sh…Published: April 21, 2020

One influential 2019 study examined whether attention weights align with other measures of importance, such as gradients that estimate how much changing an input would affect the output. The researchers found that attention often showed weak correlation with these alternative measures. They also demonstrated that substantially different attention distributions could sometimes produce essentially the same prediction. Their conclusion was that standard attention weights should not automatically be treated as explanations. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…

How information mixes across layers

A deeper challenge arises from the architecture of Transformer models themselves. Modern Transformers contain many layers, and each layer repeatedly mixes information from different tokens. By the time a model reaches its final layers, the representation associated with a single token is no longer tied cleanly to one location in the original input.

Research on attention flow showed that information becomes increasingly blended as it passes through successive self-attention layers. A high attention score in a late layer therefore does not necessarily indicate direct reliance on the corresponding input token. Instead, that score may reflect a representation that already contains information gathered from many other positions earlier in the network. [ACL Anthology+2arXiv]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…

A useful analogy is a river system. Looking only at the final channel does not reveal where the water originally came from because many tributaries have merged upstream. Similarly, a late-layer attention map may highlight one token even though the underlying information originated from several different parts of the input.

This mixing effect means that raw attention weights can become increasingly misleading when interpreted as a direct answer to the question, “Which input mattered most?” The attention values describe local routing decisions within the network, while the influence of an input may depend on a long chain of interactions across multiple layers. [ACL Anthology]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…

Why different attention patterns can lead to the same answer

Another reason for caution is that neural networks often contain multiple pathways for reaching the same prediction. Researchers have constructed alternative or counterfactual attention distributions that differ substantially from the original attention pattern while leaving the model’s output nearly unchanged. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…

This finding matters because an explanation should ideally identify factors that are necessary for the decision. If radically different attention maps can produce the same result, then a single attention map may not uniquely explain why the model answered as it did.

The issue becomes even more complicated in multi-head attention. Different heads may specialise in different relationships, such as syntax, positional information, or entity tracking. A heat map from one head can therefore present only a partial view of the computation. Looking at a single head may create an appealing story while hiding important contributions from other heads and later processing stages. [Wikipedia]WikipediaAttention (machine learningAttention (machine learning

Attention maps illustration 2

What researchers disagree about

Although many studies caution against equating attention with explanation, the debate is not settled. Some researchers argue that the question depends on what is meant by an explanation in the first place.

A notable response to the “attention is not explanation” critique argued that attention can still provide useful explanatory information under certain conditions. Rather than treating attention as a complete causal account, these researchers proposed evaluating attention using more rigorous tests and comparing it against baselines. Their position was not that attention perfectly explains model behaviour, but that it can sometimes offer meaningful insight when interpreted carefully. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana…

This debate has shifted the field away from simple yes-or-no claims. The more common view today is that attention maps may reveal something about a model’s internal processing, but their explanatory value depends on context, architecture, evaluation method, and the specific question being asked. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana…

Attention maps illustration 3

Safer ways to read model explanations

Because raw attention weights can be misleading, researchers have developed methods that attempt to track influence more faithfully.

One approach is attention flow or attention rollout, which traces information through multiple layers instead of examining a single attention matrix in isolation. Studies have found that these methods correlate more closely with other importance measures than raw attention weights do. [arXiv]arxiv.orgarXiv Quantifying Attention Flow in TransformersarXiv Quantifying Attention Flow in Transformers

Another approach compares attention-based interpretations with independent techniques such as:

Gradient-based attribution, which estimates how output changes when inputs change.
Ablation tests, which remove or alter parts of the input and measure the effect on predictions.
Shapley-value methods, which estimate each feature’s contribution by analysing many combinations of inputs.
Counterfactual testing, which examines whether changing a supposedly important input actually changes the outcome. [arXiv+2arXiv]arxiv.orgarXiv Quantifying Attention Flow in TransformersarXiv Quantifying Attention Flow in Transformers

Using multiple explanation methods together is generally more reliable than relying on attention maps alone. When several independent techniques point to the same important features, confidence in the interpretation increases.

What this means for understanding AI

Attention maps remain valuable because they can reveal patterns that are difficult to see otherwise. They can help researchers inspect model behaviour, identify biases, and explore how information moves through a Transformer. However, they should not be treated as straightforward proof that a particular input caused a particular answer.

The central lesson is that attention measures information routing, not necessarily importance or causation. As information mixes across layers and multiple computational paths interact, the connection between an attention score and a model’s final decision becomes less direct. Attention visualisations can therefore be informative, but only when interpreted as partial evidence within a broader toolkit of explanation methods. [arXiv+3ACL Anthology+3arXiv]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

A4 ART PRINT Black Cat Cosplay 009 ART by Ai2 Collectible Ltd Edition 1 of 1 AP

Search eBay.co.uk: AI art print

Browse similar on eBay.co.uk

Example eBay listing

STICKERBOX PRINT AI ART

Search eBay.co.uk: AI art print

Browse similar on eBay.co.uk

Example eBay listing

A4 ART PRINT Harley Quinn Art Nude 5 ART by Ai2 Collectible Ltd Edition 1 of 1

Search eBay.co.uk: AI art print

Browse similar on eBay.co.uk

Example eBay listing

A4 ART PRINT Black Cat Art Nude 008 ART by Ai2 Collectible Ltd Edition 1 of 1 AP

Search eBay.co.uk: AI art print

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Title: arXiv Attention is not Explanation
Link: https://arxiv.org/abs/1902.10186
Source snippet
[1902.10186] Attention is not Explanationby S Jain · 2019 · Cited by 2494 — Our findings show that standard attention modules do not prov...
Source: Wikipedia
Title: Attention ([machine learning]({{ ‘machine-learning/’ | relative_url }}))
Link: https://en.wikipedia.org/wiki/Attention_%28machine_learning%29
Source: arxiv.org
Link: https://arxiv.org/abs/2004.10102
Source snippet
Attention is Not Only a Weight: Analyzing Transformers with...April 21, 2020 — by G Kobayashi · 2020 · Cited by 317 — This paper sh...

Published: April 21, 2020
Source: arxiv.org
Title: arXiv Quantifying Attention Flow in Transformers
Link: https://arxiv.org/abs/2005.00928
Source: arxiv.org
Title: arXiv Attention is not not Explanation
Link: https://arxiv.org/abs/1908.04626
Source: arxiv.org
Title: arXiv Attention Flows are Shapley Value Explanations
Link: https://arxiv.org/abs/2105.14652
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Attention
Source snippet
AttentionAttention is the concentration of awareness directed at some task or phenomenon while mostly excluding others. Focused attent...
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357/
Source snippet
ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov...
Source: aclanthology.org
Link: https://aclanthology.org/2020.acl-main.385/
Source snippet
ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform...
Source: samiraabnar.github.io
Title: attention flow
Link: https://samiraabnar.github.io/articles/2020-04/attention_flow
Source snippet
Quantifying Attention Flow in Transformers5 Apr 2020 — This makes attention weights unreliable as explanation probes to answer questions...
Source: aclanthology.org
Title: We challenge many of the assumptions underlying this work.Read more
Link: https://aclanthology.org/D19-1002/
Source snippet
ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana...
Source: aclanthology.org
Link: https://aclanthology.org/N19-1357.pdf
Source snippet
Attention is not Explanationby S Jain · 2019 · Cited by 2494 — (i) Attention weights should correlate with feature importance measures (e...
Source: researchgate.net
Link: https://www.researchgate.net/publication/331396991_Attention_is_not_Explanation
Source snippet
materially affect the prediction, especially in deep, multi-layer...Read more...
Source: github.com
Link: https://github.com/sarahwie/attention
Source snippet
Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...
Source: liner.com
Title: attention is not explanation
Link: https://liner.com/review/attention-is-not-explanation
Source snippet
[Quick Review]26 Feb 2019 — This work aims to assess the degree to which attention weights provide meaningful 'explanations' for predicti...

Additional References

Source: medium.com
Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017
Source snippet
Attention is not not ExplanationExplanation can mean different things, and J&W are not clear on which interpretation they wish to disasso...
Source: bibbase.org
Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019
Source snippet
Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...
Source: openreview.net
Link: https://openreview.net/pdf/77bf7201d7f76cb90754860c7a126c14ffb9c5ba.pdf
Source snippet
GENERALIZED ATTENTION FLOWIn XAI, axioms are core principles that guide the evaluation of explanation methods, ensuring their reliability...
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Attention-is-not-Explanation-Jain-Wallace/1e83c20def5c84efa6d4a0d80aa3159f55cb9c3f
Source snippet
[PDF] Attention is not ExplanationThis work performs extensive experiments across a variety of NLP tasks to assess the degree to which at...
Source: elib.uni-stuttgart.de
Link: https://elib.uni-stuttgart.de/bitstreams/af346489-f4ec-4c08-b581-77bf6f6c370a/download
Source snippet
Systematic Review of Explainable AI Methods for...by M Denizoglu — Although attention flow weights are more reliable than attention roll...
Source: openreview.net
Title: Attention Flows for General Transformersby N Metzger · Cited by 4 —
Link: https://openreview.net/forum?id=pcBJT4bgbpH
Source snippet
Abstract: In this paper, we study the computation of how much an input token in a Transformer model influences its prediction.Read more...
Source: medium.com
Title: exploring visual attention in transformer models ab538c06083a
Link: https://medium.com/%40nivonl/exploring-visual-attention-in-transformer-models-ab538c06083a
Source snippet
Exploring Visual Attention in Transformer ModelsThe attention rollout method developed in Abnar & Zuidema(2020) is for [language models]({{ 'language-models/' | relative_url }}) in...
Source: researchgate.net
Link: https://www.researchgate.net/publication/343298597_Quantifying_Attention_Flow_in_Transformers
Source snippet
ss transformer layers to estimate token influence in the final prediction [9].Read more...
Source: youtube.com
Title: Self Attention Geometric Intuition | How to Visualize Self Attention
Link: https://www.youtube.com/watch?v=5ZgGuujZSbs
Source snippet
Attention is not not explanation + Character Eyes: Seeing Language through Character-Level Taggers | - YouTube Attention is not not expla...
Source: bibbase.org
Title: jain wallace attentionisnotexplanation 2019
Link: https://bibbase.org/network/publication/jain-wallace-attentionisnotexplanation-2019
Source snippet
Attention is not ExplanationAttention is not Explanation. Jain, S. & Wallace, B. C. In Proceedings of the 2019 Conference of the North Am...

Can attention maps really explain AI?

Introduction

Can attention maps really explain AI?

What attention weights appear to show

How information mixes across layers

Why different attention patterns can lead to the same answer

What researchers disagree about

Safer ways to read model explanations

What this means for understanding AI

Further Reading

Natural Language Processing with Transformers

Artificial Intelligence

Deep Learning

Interpretable Machine Learning

Marketplace Samples

A4 ART PRINT Black Cat Cosplay 009 ART by Ai2 Collectible Ltd Edition 1 of 1 AP

STICKERBOX PRINT AI ART

A4 ART PRINT Harley Quinn Art Nude 5 ART by Ai2 Collectible Ltd Edition 1 of 1

A4 ART PRINT Black Cat Art Nude 008 ART by Ai2 Collectible Ltd Edition 1 of 1 AP

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 4

More on this topic 3