Within Image layers
Can we trust pictures of AI neurons?
Feature visualisation can reveal useful internal patterns, but synthetic images may make neurons look cleaner than they really are.
On this page
- What activation maximised images can reveal
- Why one neuron may have multiple meanings
- How real image examples can check synthetic views
Page outline Jump by section
Introduction
Feature visualisation is one of the most striking tools in AI interpretability. Researchers can generate synthetic images that strongly activate a neuron or layer inside an image-recognition network, producing pictures that appear to reveal what the system has learned. These images have helped show how deep networks move from simple edge detectors to representations of textures, object parts and higher-level concepts. However, the resulting pictures are not direct photographs of a neuron’s “thoughts”. They are interpretations produced by a method, and that method has important limitations. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
The key lesson is not that feature visualisation is useless. It is that visualisations can be persuasive while still being incomplete. A synthetic image may highlight one aspect of a neuron’s behaviour, hide others, or exaggerate how cleanly a concept is represented inside a network. Understanding these limitations is essential when using visualisations to explain how image layers turn pixels into objects. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
What activation-maximised images can reveal
Most feature visualisation methods use a technique called activation maximisation. Starting from random noise, the algorithm gradually modifies an image so that a chosen neuron, channel or class becomes increasingly active. The final image is then treated as a clue about what patterns the network prefers. [christophm.github.io]christophm.github.ioFeature Visualization visualizes the learned features by activation…Read more…
This approach has produced valuable insights. Early-layer visualisations often resemble edges, colour contrasts and simple textures. Deeper-layer visualisations can contain fur-like patterns, wheels, faces, feathers or other structures that correspond to meaningful visual features. Such results helped demonstrate that image networks frequently build hierarchical representations rather than relying solely on memorised templates. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
The problem is that activation-maximised images are not natural photographs. They are artificial solutions to an optimisation problem. The algorithm is searching for whatever input most strongly excites a target neuron, not for a typical image that humans would encounter. As a result, visualisations can contain unusual combinations of textures, colours and shapes that rarely occur in real-world scenes. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
A neuron that helps detect dog faces, for example, might be visualised as a surreal blend of fur textures, eye-like structures and abstract patterns. The image may communicate something genuine about the neuron’s preferences, but it can also create the impression that the neuron is far more specialised or interpretable than it really is. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
Why one neuron may have multiple meanings
One of the most important discoveries in interpretability research is that many neurons are not dedicated detectors for a single concept. Instead, they can respond to several different visual patterns. Researchers sometimes describe this as a neuron being “multifaceted” or, more broadly, as a form of polysemantic behaviour. [ResearchGate+2arXiv]researchgate.netResearch Gate Uncovering the Different Types of Features Learned ByUncovering the Different Types of Features Learned By…February 11, 2016 — A limitation of current techniques is that they…
This creates a problem for simple feature visualisations. If a neuron activates for multiple distinct situations, a single synthetic image may blend those situations together. The resulting picture can look confusing because it is effectively averaging several meanings into one visual artefact. [ResearchGate]researchgate.netResearch Gate Uncovering the Different Types of Features Learned ByUncovering the Different Types of Features Learned By…February 11, 2016 — A limitation of current techniques is that they…
Researchers studying multifaceted feature visualisation highlighted this issue using examples where a neuron responded to very different appearances associated with the same category. A neuron linked to grocery-store recognition, for instance, might activate for both storefront views and rows of produce. A single activation-maximised image can merge these patterns into an unrealistic hybrid that never appears in nature. [ResearchGate]researchgate.netResearch Gate Uncovering the Different Types of Features Learned ByUncovering the Different Types of Features Learned By…February 11, 2016 — A limitation of current techniques is that they…
This matters because readers often interpret neuron images too literally. Seeing a clean synthetic pattern can encourage the belief that a neuron corresponds neatly to a single human concept. In reality, the network’s representation may be distributed across many neurons, while individual neurons participate in multiple overlapping functions. [netdissect.csail.mit.edu+2Wikipedia]netdissect.csail.mit.eduOpen source on mit.edu.
Why synthetic images can overstate interpretability
Another risk is that feature visualisations can make a network appear more understandable than it actually is.
The Network Dissection project was partly motivated by concerns that visualisations themselves require human interpretation. Looking at a synthetic image still leaves researchers asking what concept it represents and how consistently that concept appears in real data. The image is an interpretation of the model, not a direct measurement of its internal semantics. [CVF Open Access]openaccess.thecvf.comBau Network Dissection Quantifying CVPR 2017 paperCVF Open AccessQuantifying Interpretability of Deep Visual Representationsby D Bau · 2017 · Cited by 2331 — Visualizations digest the mec…
Researchers have also shown that visual interpretations can be manipulated. Studies on attacks against neuron interpretation demonstrated that model representations can sometimes be altered so that feature visualisations change dramatically while predictive performance remains largely intact. In other words, a compelling visual explanation does not necessarily correspond to a uniquely correct understanding of what the model is doing. [OpenReview]openreview.netOpen Review Adversarial Attacks on Neuron Interpretation via ActivationAdversarial Attacks on Neuron Interpretation via Activation…November 18, 2023 — We study the feature visualization of a neur…
The broader lesson is that attractive images should not be confused with definitive explanations. A visualisation may reveal a genuine tendency inside a network while still omitting important aspects of how that representation operates in practice. [OpenReview]openreview.netOpen Review Adversarial Attacks on Neuron Interpretation via ActivationAdversarial Attacks on Neuron Interpretation via Activation…November 18, 2023 — We study the feature visualization of a neur…
How real-image examples can check synthetic views
One way researchers reduce these risks is by comparing synthetic visualisations with real images that strongly activate the same neuron.
Instead of asking only, “What image can we generate to maximise activation?”, researchers also ask, “Which photographs in a dataset naturally activate this unit?” Looking at real examples often reveals whether the synthetic image corresponds to genuine behaviour or whether it emphasises unusual optimisation artefacts. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
This comparison can expose hidden complexity. A synthetic image might suggest that a neuron detects a dog’s nose, yet examination of real-image activations could show responses to several related face structures rather than a single isolated part. Conversely, repeated activation across many photographs can strengthen confidence that the visualisation has identified a meaningful pattern. [Distill]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
Methods such as Network Dissection push this idea further by measuring how strongly units align with labelled concepts in large image datasets. Rather than relying entirely on visual inspection, they attempt to quantify whether a unit consistently corresponds to objects, parts, textures, colours or other semantic categories. [netdissect.csail.mit.edu]netdissect.csail.mit.eduOpen source on mit.edu.
Can we trust pictures of AI neurons?
The most accurate answer is: trust them as clues, not as ground truth.
Feature visualisation has been enormously useful for understanding how image networks organise visual information. It has revealed evidence for edges, textures, object parts and higher-level structures emerging across layers. Yet the same research community that developed these methods has repeatedly emphasised their limitations. Synthetic images are optimisation products, neurons may have multiple meanings, and visual explanations can oversimplify distributed representations. [Distill+2christophm.github.io]distill.pubFeature VisualizationFeature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely…
Evidence from interpretability studies suggests that feature visualisations work best when combined with other checks: inspection of real activating images, quantitative concept analysis and broader examination of how groups of units work together. When used in that way, they provide valuable windows into AI vision. When treated as literal pictures of what a neuron “sees”, they can become misleading. netdissect.csail.mit.edu+2CVF Open Access [netdissect.csail.mit.edu]netdissect.csail.mit.eduOpen source on mit.edu.
Amazon book picks
Further Reading
Books and field guides related to Can we trust pictures of AI neurons?. Use these as the next step if you want deeper reading beyond the article.
Deep Learning with Python
Explains how deep networks learn representations that can later be visualised and interpreted.
Deep Learning
Rating: 3.5/5 from 6 Google Books ratings
Strong coverage of learned features, representation learning, and neural network internals relevant to feature visualisation.
The Deep Learning Revolution
Provides broader context for why interpretability and representation learning matter.
Interpretable Machine Learning
Directly addresses methods for understanding model behaviour and limitations of explanations.
Endnotes
-
Source: distill.pub
Title: Feature Visualization
Link: https://distill.pub/2017/feature-visualizationSource snippet
Feature VisualizationNovember 7, 2017 — by C Olah · 2017 · Cited by 1615 — Diverse feature visualizations allow us to more closely...
Published: November 7, 2017
-
Source: christophm.github.io
Link: https://christophm.github.io/interpretable-ml-book/cnn-features.htmlSource snippet
Feature Visualization visualizes the learned features by activation...Read more...
-
Source: researchgate.net
Title: Research Gate Uncovering the Different Types of Features Learned By
Link: https://www.researchgate.net/publication/301845946_Multifaceted_Feature_Visualization_Uncovering_the_Different_Types_of_Features_Learned_By_Each_Neuron_in_Deep_Neural_NetworksSource snippet
Uncovering the Different Types of Features Learned By...February 11, 2016 — A limitation of current techniques is that they...
Published: February 11, 2016
-
Source: arxiv.org
Link: https://arxiv.org/abs/1602.03616 -
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Polysemanticity -
Source: netdissect.csail.mit.edu
Link: https://netdissect.csail.mit.edu/ -
Source: arxiv.org
Title: arXiv Interpreting Neural Networks through the Polytope Lens
Link: https://arxiv.org/abs/2211.12312 -
Source: openreview.net
Title: Open Review [Adversarial]({{ ‘stress-tests/’ | relative_url }}) Attacks on Neuron Interpretation via Activation
Link: https://openreview.net/pdf/637f5c318237190cce1ff2528a37fc06346a5812.pdfSource snippet
Adversarial Attacks on Neuron Interpretation via Activation...November 18, 2023 — We study the feature visualization of a neur...
Published: November 18, 2023
-
Source: arxiv.org
Link: https://arxiv.org/abs/2106.12447 -
Source: arxiv.org
Link: https://arxiv.org/html/2508.07281v1Source snippet
Representation Understanding via Activation Maximization10 Aug 2025 — In this work, we propose a unified feature visualization framework...
-
Source: arxiv.org
Link: https://arxiv.org/html/2604.08039v1Source snippet
LLM-based Iterative Neuron Explanations for Vision Models9 Apr 2026 — Interpreting the concepts encoded by individual neurons in deep neu...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/320971142_Network_Dissection_Quantifying_Interpretability_of_Deep_Visual_RepresentationsSource snippet
Quantifying Interpretability of Deep Visual RepresentationsThese "Network Dissection" approaches [Bau et al., 2017] enabled systematic ch...
-
Source: github.com
Link: https://github.com/CSAILVision/NetDissect-LiteSource snippet
Network Dissection Lite in PyTorchThis repository is a light version of NetDissect, which contains the demo code for the work Network Dis...
-
Source: distill.pub
Title: activation atlas
Link: https://distill.pub/2019/activation-atlasSource snippet
Exploring Neural Networks with Activation Atlasesby S Carter · 2019 · Cited by 260 — We create an explorable activation atlas of features...
-
Source: openreview.net
Link: https://openreview.net/pdf/a302e0072a6e15c8c0361c022bb9d3518f1a7127.pdf -
Source: openaccess.thecvf.com
Title: Bau Network Dissection Quantifying CVPR 2017 paper
Link: https://openaccess.thecvf.com/content_cvpr_2017/papers/Bau_Network_Dissection_Quantifying_CVPR_2017_paper.pdfSource snippet
CVF Open AccessQuantifying Interpretability of Deep Visual Representationsby D Bau · 2017 · Cited by 2331 — Visualizations digest the mec...
Additional References
-
Source: openfl.pressbooks.pub
Link: https://openfl.pressbooks.pub/unfbusinessanalytics/chapter/feature-visualization/Source snippet
Feature Visualization – [Business]({{ 'business-adoption/' | relative_url }}) AnalyticsFeature visualization for a unit of a neural network is done by finding the input that maximize...
-
Source: lmb.informatik.uni-freiburg.de
Link: https://lmb.informatik.uni-freiburg.de/lectures/seminar_brox/seminar_ss22/network_vis.pdfSource snippet
VisualizationsImages synthesized by Activation-Maximization are NOT more helpful than other kinds of visualizations • Experiment is limit...
-
Source: linkedin.com
Link: https://www.linkedin.com/pulse/overview-feature-visualization-activation-am-method-shrivastava-bdkmcSource snippet
An overview of Feature Visualization: Activation...Activation Maximization (AM) is a popular method for feature visualization...
-
Source: medium.com
Link: https://medium.com/%40hke22/techniques-of-feature-visualisation-activation-maximisation-in-convolutional-neural-networks-07443d822380Source snippet
It aims at synthesising a specific input that maximises a neuron's activation with a...Read more...
-
Source: anhnguyen.me
Link: https://anhnguyen.me/2016/mfv/Source snippet
s only one type of feature, but we know that neurons can be multifaceted, in that they...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/4fc46f52419f37bd3d3539b8442ea232e24d0e00Source snippet
terpretability of the units inside a deep convolutional neural networks...
-
Source: ruthfong.com
Title: Understanding Convolutional Neural Networks
Link: https://www.ruthfong.com/files/fong20_thesis.pdfSource snippet
AbstractIn this thesis, we introduce several methods for understanding convolutional neural networks (CNNs), the class of [deep learning]({{ 'deep-learning/' | relative_url }}) m...
-
Source: aisafety.info
Link: https://aisafety.info/questions/8HIA/What-is-feature-visualizationSource snippet
into the concepts that neural networks have...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=Xy6RcjXMa2cSource snippet
propose a general framework called Network Dissection for quantifying...
-
Source: youtube.com
Title: ‘How neural networks learn’
Link: https://www.youtube.com/watch?v=McgxRxi2JqoSource snippet
Interpretability vs. Explainability in Machine Learning...
Topic Tree



