When high accuracy is the warning sign

Introduction

A machine-learning model can achieve extremely high training accuracy and still be learning the wrong thing. This happens because training accuracy measures how well a model performs on examples it has already seen, not whether it has discovered patterns that will work on new data. A model may appear successful simply because it has memorised details of the training set, including mistakes, coincidences and irrelevant features. When that happens, the impressive score hides a deeper problem: the model has learned how to reproduce the training examples rather than how to generalise beyond them. This distinction lies at the heart of overfitting and explains why developers rely on unseen test data rather than training results alone. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…Published: April 1, 2025

Hidden memorisation illustration 1

Why training scores can be misleading

Training accuracy answers a narrow question: “How many training examples did the model get right?” It does not answer the more important question: “Will the model get future examples right?”

A sufficiently flexible model can often reduce training errors by storing increasingly specific information about individual examples. If the dataset contains accidental patterns, unusual cases or noisy labels, the model can absorb those as well. As training continues, the score on the training set rises, creating the impression that learning is improving. In reality, some of that improvement may come from remembering peculiarities that exist only in the training data. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…Published: April 1, 2025

Researchers studying memorisation in deep learning have repeatedly found that modern neural networks can fit even random or noisy data. The ability to achieve near-perfect training performance therefore does not automatically prove that meaningful structure has been learned. Instead, it demonstrates that the model has enough capacity to reproduce the examples it was shown. [arXiv]arxiv.orgMemorization in deep learning: A survey6 Jun 2024 — This survey offers the first-in-kind understanding of memorization in DNNs, prov…

The warning sign appears when performance on new examples fails to match the impressive training score. A model may report 99% accuracy during training while performing substantially worse on previously unseen data. The gap reveals that some of the apparent success came from memorisation rather than transferable knowledge. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…Published: April 1, 2025

Memorisation versus learning the rule

The key difference is not whether a model remembers information. All useful machine-learning systems retain information from training. The question is what kind of information they retain.

A model that learns the underlying rule identifies relationships that remain valid when new examples arrive. For example, a system trained to recognise handwritten digits should focus on shapes and structures that distinguish one number from another.

A model that memorises instead learns facts about particular training examples. It may effectively treat each example as a separate case rather than discovering a general principle. This strategy can produce excellent training performance because the training examples never change. However, when unfamiliar inputs appear, the memorised details provide little guidance. [Infinite Faculty]infinitefaculty.substack.comInfinite Faculty Memorization vsgeneralization in deep learning: implicit…February 18, 2026 — Overfitting matches all the training data points perfectly, but makes wo…Published: February 18, 2026

An everyday analogy is a student preparing for an examination. One student memorises the answers to every practice question. Another learns the method behind the questions. Both may score perfectly on the practice set, but only the second student is likely to succeed when presented with new problems. Training accuracy alone cannot distinguish between these two forms of success.

Why memorisation can look like learning

The difficulty is that both behaviours often produce the same result on the training set. A model that has genuinely learned a rule and a model that has merely memorised examples may each achieve 100% training accuracy.

From the perspective of the training data, there is no obvious difference. The distinction becomes visible only when the model encounters examples that were not available during training. This is why machine-learning evaluation depends so heavily on validation and test sets. They expose whether the model has captured a reusable pattern or simply remembered the training material. [Cross Validated]stats.stackexchange.comwhat should i do when my neural network doesnt generalize wellCross ValidatedWhat should I do when my neural network doesn't generalize…7 Sept 2018 — I'm training a neural network and the training…

Hidden memorisation illustration 2

How noisy details become shortcuts

One of the most common paths to hidden memorisation is shortcut learning. Instead of learning the feature that truly matters, a model discovers an easier signal that happens to correlate with the correct answer in the training data. [Communications of the ACM+2Frontiers]cacm.acm.orgshortcut learning of large language models in natural language understandingThe method synthesizes a pair…Read more…

Imagine a dataset designed to identify animals. Suppose most photographs of wolves were taken in snowy environments while most photographs of dogs were not. A model might learn to associate snow with wolves. Training accuracy could become very high because the shortcut works on the training images. Yet the model would fail when shown a wolf standing on grass.

Researchers describe these misleading signals as spurious correlations: relationships that appear useful in the training data but do not represent the true cause of the outcome. Such correlations often disappear when conditions change, leading to poor generalisation. [arXiv+2Frontiers]arxiv.orgSpurious Correlations in Machine Learning: A Survey20 Feb 2024 — Spurious correlation, namely “correlations that do not imply causat…

The problem is especially deceptive because exploiting shortcuts frequently improves training accuracy faster than learning the deeper pattern. The model is rewarded for taking the easy route, even if that route will later fail.

Noise can become part of the model

Training datasets often contain imperfections:

Incorrect labels
Measurement errors
Unusual outlier examples
Accidental correlations

A powerful model can eventually fit these irregularities as well as the genuine signal. Research on memorisation shows that neural networks are capable of learning noisy examples that contain little useful information for future predictions. When this happens, the model’s training accuracy continues to climb even though its ability to generalise may stagnate or worsen. [arXiv+2OpenReview]arxiv.orgMemorization in deep learning: A survey6 Jun 2024 — This survey offers the first-in-kind understanding of memorization in DNNs, prov…

This explains why a steadily improving training score is not always good news. Sometimes it indicates that the model has moved beyond learning the main pattern and has started encoding noise.

Hidden memorisation illustration 3

When perfect training accuracy should raise suspicion

Perfect training accuracy is not automatically a problem. Some well-designed models achieve it and still generalise successfully. However, perfection deserves scrutiny because it can signal that memorisation has become a major part of the learning process. [Wikipedia]WikipediaDouble descentDouble descent

Modern research has revealed surprising behaviour in large neural networks. In some situations, models first overfit and memorise before later developing more general solutions. Studies of phenomena such as grokking and double descent show that the relationship between memorisation and generalisation is more complex than older textbook explanations suggested. Even so, these findings do not eliminate the need for testing on unseen data. They reinforce it. A training score alone still cannot reveal whether the model’s apparent success comes from genuine understanding or from fitting the peculiarities of the training set. [Pair with Google+2Wikipedia]pair.withgoogle.comPair with GoogleDo Machine Learning Models Memorize or Generalize?by A Pearce · Cited by 17 — With too little weight decay, the model can…

The practical lesson is simple: high training accuracy measures success on familiar examples, not usefulness in the real world. Without evaluating new examples, there is no reliable way to tell whether a model has learned a durable rule or merely become an expert at remembering its training data. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…Published: April 1, 2025

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

Got Machine Learning? T shirt Tee

Search eBay.co.uk: machine learning t shirt

Browse similar on eBay.co.uk

Example eBay listing

Keep Calm and Study Machine Learning T shirt Funny Tee

Search eBay.co.uk: machine learning t shirt

Browse similar on eBay.co.uk

Example eBay listing

I LOVE MACHINE LEARNING T-SHIRT heart ai data science algorithms technology

Search eBay.co.uk: machine learning t shirt

Browse similar on eBay.co.uk

Example eBay listing

Eat Sleep Machine Learning T shirt Tee

Search eBay.co.uk: machine learning t shirt

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: openreview.net
Link: https://openreview.net/pdf?id=vVhZh9ZpIM
Source snippet
WHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m...

Published: April 1, 2025
Source: arxiv.org
Link: https://arxiv.org/html/2406.03880v1
Source snippet
Memorization in deep learning: A survey6 Jun 2024 — This survey offers the first-in-kind understanding of memorization in DNNs, prov...
Source: cacm.acm.org
Title: shortcut learning of large [language models]({{ ‘language-models/’ | relative_url }}) in natural language understanding
Link: https://cacm.acm.org/research/shortcut-learning-of-large-language-models-in-natural-language-understanding/
Source snippet
The method synthesizes a pair...Read more...
Source: arxiv.org
Link: https://arxiv.org/html/2402.12715v1
Source snippet
Spurious Correlations in Machine Learning: A Survey20 Feb 2024 — Spurious correlation, namely “correlations that do not imply causat...
Source: arxiv.org
Link: https://arxiv.org/html/2310.13572v3
Source snippet
Unraveling the Enigma of Double Descent: An In-depth...30 Apr 2024 — In this study, we revisit the phenomenon of double descent and demo...
Source: Wikipedia
Title: Double descent
Link: https://en.wikipedia.org/wiki/Double_descent
Source: Wikipedia
Title: Grokking (machine learning)
Link: https://en.wikipedia.org/wiki/Grokking_%28machine_learning%29
Source: arxiv.org
Link: https://arxiv.org/html/2412.05152v1
Source snippet
Navigating Shortcuts, Spurious Correlations, and...6 Dec 2024 — One of the most general definitions of a spurious correlation is "a corr...
Source: openreview.net
Link: https://openreview.net/pdf?id=12RoR2o32T
Source snippet
Shortcut learning in deep neural...Read more...
Source: dl.acm.org
Link: https://dl.acm.org/doi/10.1145/3769076
Source snippet
in Deep Learning: A SurveyFinally, they validate the memorization effect (i.e., accuracy of noisy examples in the training dataset) on mo...
Source: stats.stackexchange.com
Title: what should i do when my neural network doesnt generalize well
Link: https://stats.stackexchange.com/questions/365778/what-should-i-do-when-my-neural-network-doesnt-generalize-well
Source snippet
Cross ValidatedWhat should I do when my neural network doesn't generalize...7 Sept 2018 — I'm training a neural network and the training...
Source: infinitefaculty.substack.com
Title: Infinite Faculty Memorization vs
Link: https://infinitefaculty.substack.com/p/memorization-vs-generalization-in
Source snippet
generalization in deep learning: implicit...February 18, 2026 — Overfitting matches all the training data points perfectly, but makes wo...

Published: February 18, 2026
Source: frontiersin.org
Link: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1692454/full
Source snippet
Unmasking the Clever Hans effect in AI modelsby AK Pathak · 2025 · Cited by 2 — The Clever Hans effect in AI can be formalized using the...
Source: pair.withgoogle.com
Link: https://pair.withgoogle.com/explorables/grokking/
Source snippet
Pair with GoogleDo Machine Learning Models Memorize or Generalize?by A Pearce · Cited by 17 — With too little weight decay, the model can...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/357412481_Deep_double_descent_where_bigger_models_and_more_data_hurt
Source snippet
Deep double descent: where bigger models and more data...We show that a variety of modern deep learning tasks exhibit a 'double-descent'...
Source: medium.com
Link: https://medium.com/aimonks/striking-a-balance-navigating-memorization-and-generalization-in-deep-learning-30f5da3f07cb
Source snippet
igher than test accuracy, the model may be overfitting.Read more...
Source: youtube.com
Title: Quantifying and Understanding Memorization in Deep Neural
Link: https://www.youtube.com/watch?v=Ohl5AGUOLXk
Source snippet
Google Abstract: Deep learning algorithms are well-known to have a propensity for fitting the training data very well and memorize... ge...
Source: facebook.com
Link: https://www.facebook.com/groups/3670562573177653/posts/3936796146554293/
Source snippet
ning and test losses ・ successful deep artificial neural...
Source: alignmentforum.org
Title: understanding deep double descent
Link: https://www.alignmentforum.org/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descent
Source snippet
Understanding “Deep Double Descent”5 Dec 2019 — Double descent is a puzzling phenomenon in machine learning where increasing model size/t...
Source: stat.berkeley.edu
Title: 12369 Mitigating Memorization
Link: https://www.stat.berkeley.edu/~mmahoney/pubs/12369Mitigating_Memorization.pdf
Source snippet
MEMORIZATION IN LANGUAGE MODELSby M Sakarvadia · Cited by 19 — Thus we propose a computationally efficient suite of GPT2- style models, T...
Source: arjunahuja.medium.com
Title: double descent 8f92dfdc442f
Link: https://arjunahuja.medium.com/double-descent-8f92dfdc442f
Source snippet
Descent. Breakthroughs in Machine Learning are…Yet practitioners routinely use larger and larger neural networks to improve test set accu...
Source: youtu.be
Link: https://youtu.be/XL07WEc2TRI
Source snippet
"Amazing lecture by Ilya Sutkever at MIT: [https://youtu.be/9EN_HoEk3KY..."](https://youtu.be/9EN_HoEk3KY...")...
Source: stpn.bearblog.dev
Title: reproducing double descent
Link: https://stpn.bearblog.dev/reproducing-double-descent/
Source snippet
deep double descent3 Jun 2025 — The phrasing double descent refers to this behavior where error gets better at first, then peaks much wor...
Source: transformer-circuits.pub
Link: https://transformer-circuits.pub/2023/toy-double-descent/index.html
Source snippet
Superposition, Memorization, and Double Descent5 Jan 2023 — We observe double descent Reconciling modern machine-learning practice the Re...

When high accuracy is the warning sign

Introduction

Why training scores can be misleading

Memorisation versus learning the rule

Why memorisation can look like learning

How noisy details become shortcuts

Noise can become part of the model

When perfect training accuracy should raise suspicion

Further Reading

Hands-on Machine Learning with Scikit-Learn, Keras, and Tenso...

The Hundred-page Machine Learning Book

Pattern Recognition and Machine Learning

An Introduction to Statistical Learning

Marketplace Samples

Got Machine Learning? T shirt Tee

Keep Calm and Study Machine Learning T shirt Funny Tee

I LOVE MACHINE LEARNING T-SHIRT heart ai data science algorithms technology

Eat Sleep Machine Learning T shirt Tee

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2