Within Transformers

Can attention maps really explain AI?

Attention maps can look intuitive, but they are not simple proof of which input caused a model's answer.

On this page

  • What attention weights appear to show
  • How information mixes across layers
  • Safer ways to read model explanations
Preview for Can attention maps really explain AI?

Introduction

People often assume that an attention map reveals exactly why an artificial intelligence system produced a particular answer. The appeal is obvious: if a model assigns high attention to certain words, phrases, or image regions, those elements appear to be the cause of the output. However, research over the past several years has shown that this interpretation is often too simple. Attention weights can provide useful clues about how information moves through a model, but they are not a direct record of causation or reasoning. Studies have found that models can sometimes produce the same prediction with very different attention patterns, and that attention scores often disagree with other measures of feature importance. As a result, many researchers now treat attention maps as one piece of evidence rather than a complete explanation. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…

Attention maps illustration 1

Can attention maps really explain AI?

What attention weights appear to show

An attention map displays how strongly one token attends to other tokens during a particular computation. In a language model, a word might assign high attention to a previous noun, a punctuation mark, or another part of the sentence. When visualised as coloured heat maps, these patterns can look highly intuitive.

For example, if a model answering a question about a sentence places strong attention on the sentence’s key noun, it is tempting to conclude that the noun caused the answer. This interpretation became popular because attention weights resemble a form of human-readable importance score. Attention mechanisms were also originally described in terms of focusing on relevant information, reinforcing the idea that high attention equals high influence. [Wikipedia]WikipediaAttention (machine learningAttention (machine learning

The problem is that attention weights describe how information is combined at a specific step, not necessarily which input features ultimately determine the output. A token can receive relatively little attention yet still have a large influence through other pathways in the network. Conversely, a token may attract substantial attention without being decisive for the final prediction. [arXiv]arxiv.orgAttention is Not Only a Weight: Analyzing Transformers with…April 21, 2020 — by G Kobayashi · 2020 · Cited by 317 — This paper sh…Published: April 21, 2020

One influential 2019 study examined whether attention weights align with other measures of importance, such as gradients that estimate how much changing an input would affect the output. The researchers found that attention often showed weak correlation with these alternative measures. They also demonstrated that substantially different attention distributions could sometimes produce essentially the same prediction. Their conclusion was that standard attention weights should not automatically be treated as explanations. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…

How information mixes across layers

A deeper challenge arises from the architecture of Transformer models themselves. Modern Transformers contain many layers, and each layer repeatedly mixes information from different tokens. By the time a model reaches its final layers, the representation associated with a single token is no longer tied cleanly to one location in the original input.

Research on attention flow showed that information becomes increasingly blended as it passes through successive self-attention layers. A high attention score in a late layer therefore does not necessarily indicate direct reliance on the corresponding input token. Instead, that score may reflect a representation that already contains information gathered from many other positions earlier in the network. [ACL Anthology+2arXiv]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…

A useful analogy is a river system. Looking only at the final channel does not reveal where the water originally came from because many tributaries have merged upstream. Similarly, a late-layer attention map may highlight one token even though the underlying information originated from several different parts of the input.

This mixing effect means that raw attention weights can become increasingly misleading when interpreted as a direct answer to the question, “Which input mattered most?” The attention values describe local routing decisions within the network, while the influence of an input may depend on a long chain of interactions across multiple layers. [ACL Anthology]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…

Why different attention patterns can lead to the same answer

Another reason for caution is that neural networks often contain multiple pathways for reaching the same prediction. Researchers have constructed alternative or counterfactual attention distributions that differ substantially from the original attention pattern while leaving the model’s output nearly unchanged. [ACL Anthology]aclanthology.orgACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov…

This finding matters because an explanation should ideally identify factors that are necessary for the decision. If radically different attention maps can produce the same result, then a single attention map may not uniquely explain why the model answered as it did.

The issue becomes even more complicated in multi-head attention. Different heads may specialise in different relationships, such as syntax, positional information, or entity tracking. A heat map from one head can therefore present only a partial view of the computation. Looking at a single head may create an appealing story while hiding important contributions from other heads and later processing stages. [Wikipedia]WikipediaAttention (machine learningAttention (machine learning

Attention maps illustration 2

What researchers disagree about

Although many studies caution against equating attention with explanation, the debate is not settled. Some researchers argue that the question depends on what is meant by an explanation in the first place.

A notable response to the “attention is not explanation” critique argued that attention can still provide useful explanatory information under certain conditions. Rather than treating attention as a complete causal account, these researchers proposed evaluating attention using more rigorous tests and comparing it against baselines. Their position was not that attention perfectly explains model behaviour, but that it can sometimes offer meaningful insight when interpreted carefully. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana…

This debate has shifted the field away from simple yes-or-no claims. The more common view today is that attention maps may reveal something about a model’s internal processing, but their explanatory value depends on context, architecture, evaluation method, and the specific question being asked. [ACL Anthology]aclanthology.orgWe challenge many of the assumptions underlying this work.Read moreACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana…

Attention maps illustration 3

Safer ways to read model explanations

Because raw attention weights can be misleading, researchers have developed methods that attempt to track influence more faithfully.

One approach is attention flow or attention rollout, which traces information through multiple layers instead of examining a single attention matrix in isolation. Studies have found that these methods correlate more closely with other importance measures than raw attention weights do. [arXiv]arxiv.orgarXiv Quantifying Attention Flow in TransformersarXiv Quantifying Attention Flow in Transformers

Another approach compares attention-based interpretations with independent techniques such as:

  • Gradient-based attribution, which estimates how output changes when inputs change.
  • Ablation tests, which remove or alter parts of the input and measure the effect on predictions.
  • Shapley-value methods, which estimate each feature’s contribution by analysing many combinations of inputs.
  • Counterfactual testing, which examines whether changing a supposedly important input actually changes the outcome. [arXiv+2arXiv]arxiv.orgarXiv Quantifying Attention Flow in TransformersarXiv Quantifying Attention Flow in Transformers

Using multiple explanation methods together is generally more reliable than relying on attention maps alone. When several independent techniques point to the same important features, confidence in the interpretation increases.

What this means for understanding AI

Attention maps remain valuable because they can reveal patterns that are difficult to see otherwise. They can help researchers inspect model behaviour, identify biases, and explore how information moves through a Transformer. However, they should not be treated as straightforward proof that a particular input caused a particular answer.

The central lesson is that attention measures information routing, not necessarily importance or causation. As information mixes across layers and multiple computational paths interact, the connection between an attention score and a model’s final decision becomes less direct. Attention visualisations can therefore be informative, but only when interpreted as partial evidence within a broader toolkit of explanation methods. [arXiv+3ACL Anthology+3arXiv]aclanthology.orgACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform…

Amazon book picks

Further Reading

Books and field guides related to Can attention maps really explain AI?. Use these as the next step if you want deeper reading beyond the article.

BookCover for Deep Learning

Deep Learning

By Ian Goodfellow, Yoshua Bengio et al.

Rating: 3.5/5 from 6 Google Books ratings

Provides the neural-network background needed for understanding attention mechanisms.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Title: arXiv Attention is not Explanation
    Link: https://arxiv.org/abs/1902.10186
    Source snippet

    [1902.10186] Attention is not Explanationby S Jain · 2019 · Cited by 2494 — Our findings show that standard attention modules do not prov...

  2. Source: Wikipedia
    Title: Attention ([machine learning]({{ ‘machine-learning/’ | relative_url }}))
    Link: https://en.wikipedia.org/wiki/Attention_%28machine_learning%29

  3. Source: arxiv.org
    Link: https://arxiv.org/abs/2004.10102
    Source snippet

    Attention is Not Only a Weight: Analyzing Transformers with...April 21, 2020 — by G Kobayashi · 2020 · Cited by 317 — This paper sh...

    Published: April 21, 2020

  4. Source: arxiv.org
    Title: arXiv Quantifying Attention Flow in Transformers
    Link: https://arxiv.org/abs/2005.00928

  5. Source: arxiv.org
    Title: arXiv Attention is not not Explanation
    Link: https://arxiv.org/abs/1908.04626

  6. Source: arxiv.org
    Title: arXiv Attention Flows are Shapley Value Explanations
    Link: https://arxiv.org/abs/2105.14652

  7. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Attention
    Source snippet

    AttentionAttention is the concentration of awareness directed at some task or phenomenon while mostly excluding others. Focused attent...

  8. Source: aclanthology.org
    Link: https://aclanthology.org/N19-1357/
    Source snippet

    ACL AnthologyAttention is not Explanationby S Jain · 2019 · Cited by 2548 — Our findings show that standard attention modules do not prov...

  9. Source: aclanthology.org
    Link: https://aclanthology.org/2020.acl-main.385/
    Source snippet

    ACL AnthologyQuantifying Attention Flow in Transformersby S Abnar · 2020 · Cited by 1609 — Thus, across layers of the Transformer, inform...

  10. Source: samiraabnar.github.io
    Title: attention flow
    Link: https://samiraabnar.github.io/articles/2020-04/attention_flow
    Source snippet

    Quantifying Attention Flow in Transformers5 Apr 2020 — This makes attention weights unreliable as explanation probes to answer questions...

  11. Source: aclanthology.org
    Title: We challenge many of the assumptions underlying this work.Read more
    Link: https://aclanthology.org/D19-1002/
    Source snippet

    ACL AnthologyAttention is not not Explanationby S Wiegreffe · 2019 · Cited by 1710 — A recent paper claims that 'Attention is not Explana...

  12. Source: aclanthology.org
    Link: https://aclanthology.org/N19-1357.pdf
    Source snippet

    Attention is not Explanationby S Jain · 2019 · Cited by 2494 — (i) Attention weights should correlate with feature importance measures (e...

  13. Source: researchgate.net
    Link: https://www.researchgate.net/publication/331396991_Attention_is_not_Explanation
    Source snippet

    materially affect the prediction, especially in deep, multi-layer...Read more...

  14. Source: github.com
    Link: https://github.com/sarahwie/attention
    Source snippet

    Code for EMNLP 2019 paper "Attention is not...We've based our repository on the code provided by Sarthak Jain & Byron Wallace for their...

  15. Source: liner.com
    Title: attention is not explanation
    Link: https://liner.com/review/attention-is-not-explanation
    Source snippet

    [Quick Review]26 Feb 2019 — This work aims to assess the degree to which attention weights provide meaningful 'explanations' for predicti...

Additional References

  1. Source: medium.com
    Link: https://medium.com/%40yuvalpinter/attention-is-not-not-explanation-dbc25b534017
    Source snippet

    Attention is not not ExplanationExplanation can mean different things, and J&W are not clear on which interpretation they wish to disasso...

  2. Source: bibbase.org
    Link: https://bibbase.org/network/publication/wiegreffe-pinter-attentionisnotnotexplanation-2019
    Source snippet

    Attention is not not ExplanationWe propose four alternative tests to determine when/whether attention can be used as explanation: a simpl...

  3. Source: openreview.net
    Link: https://openreview.net/pdf/77bf7201d7f76cb90754860c7a126c14ffb9c5ba.pdf
    Source snippet

    GENERALIZED ATTENTION FLOWIn XAI, axioms are core principles that guide the evaluation of explanation methods, ensuring their reliability...

  4. Source: semanticscholar.org
    Link: https://www.semanticscholar.org/paper/Attention-is-not-Explanation-Jain-Wallace/1e83c20def5c84efa6d4a0d80aa3159f55cb9c3f
    Source snippet

    [PDF] Attention is not ExplanationThis work performs extensive experiments across a variety of NLP tasks to assess the degree to which at...

  5. Source: elib.uni-stuttgart.de
    Link: https://elib.uni-stuttgart.de/bitstreams/af346489-f4ec-4c08-b581-77bf6f6c370a/download
    Source snippet

    Systematic Review of Explainable AI Methods for...by M Denizoglu — Although attention flow weights are more reliable than attention roll...

  6. Source: openreview.net
    Title: Attention Flows for General Transformersby N Metzger · Cited by 4 —
    Link: https://openreview.net/forum?id=pcBJT4bgbpH
    Source snippet

    Abstract: In this paper, we study the computation of how much an input token in a Transformer model influences its prediction.Read more...

  7. Source: medium.com
    Title: exploring visual attention in transformer models ab538c06083a
    Link: https://medium.com/%40nivonl/exploring-visual-attention-in-transformer-models-ab538c06083a
    Source snippet

    Exploring Visual Attention in Transformer ModelsThe attention rollout method developed in Abnar & Zuidema(2020) is for [language models]({{ 'language-models/' | relative_url }}) in...

  8. Source: researchgate.net
    Link: https://www.researchgate.net/publication/343298597_Quantifying_Attention_Flow_in_Transformers
    Source snippet

    ss transformer layers to estimate token influence in the final prediction [9].Read more...

  9. Source: youtube.com
    Title: Self Attention Geometric Intuition | How to Visualize Self Attention
    Link: https://www.youtube.com/watch?v=5ZgGuujZSbs
    Source snippet

    Attention is not not explanation + Character Eyes: Seeing Language through Character-Level Taggers | - YouTube Attention is not not expla...

  10. Source: bibbase.org
    Title: jain wallace attentionisnotexplanation 2019
    Link: https://bibbase.org/network/publication/jain-wallace-attentionisnotexplanation-2019
    Source snippet

    Attention is not ExplanationAttention is not Explanation. Jain, S. & Wallace, B. C. In Proceedings of the 2019 Conference of the North Am...

Topic Tree

Follow this branch

Parent topic

Transformers The Architecture Behind Modern AI

Related pages 4

More on this topic 3