Within Overfitting
When high accuracy is the warning sign
High training scores can mask a model that has memorised noise instead of learning patterns that work on new examples.
On this page
- Why training scores can be misleading
- Memorisation versus learning the rule
- How noisy details become shortcuts
Page outline Jump by section
Introduction
A machine-learning model can achieve extremely high training accuracy and still be learning the wrong thing. This happens because training accuracy measures how well a model performs on examples it has already seen, not whether it has discovered patterns that will work on new data. A model may appear successful simply because it has memorised details of the training set, including mistakes, coincidences and irrelevant features. When that happens, the impressive score hides a deeper problem: the model has learned how to reproduce the training examples rather than how to generalise beyond them. This distinction lies at the heart of overfitting and explains why developers rely on unseen test data rather than training results alone. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…
Why training scores can be misleading
Training accuracy answers a narrow question: “How many training examples did the model get right?” It does not answer the more important question: “Will the model get future examples right?”
A sufficiently flexible model can often reduce training errors by storing increasingly specific information about individual examples. If the dataset contains accidental patterns, unusual cases or noisy labels, the model can absorb those as well. As training continues, the score on the training set rises, creating the impression that learning is improving. In reality, some of that improvement may come from remembering peculiarities that exist only in the training data. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…
Researchers studying memorisation in deep learning have repeatedly found that modern neural networks can fit even random or noisy data. The ability to achieve near-perfect training performance therefore does not automatically prove that meaningful structure has been learned. Instead, it demonstrates that the model has enough capacity to reproduce the examples it was shown. [arXiv]arxiv.orgMemorization in deep learning: A survey6 Jun 2024 — This survey offers the first-in-kind understanding of memorization in DNNs, prov…
The warning sign appears when performance on new examples fails to match the impressive training score. A model may report 99% accuracy during training while performing substantially worse on previously unseen data. The gap reveals that some of the apparent success came from memorisation rather than transferable knowledge. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…
Memorisation versus learning the rule
The key difference is not whether a model remembers information. All useful machine-learning systems retain information from training. The question is what kind of information they retain.
A model that learns the underlying rule identifies relationships that remain valid when new examples arrive. For example, a system trained to recognise handwritten digits should focus on shapes and structures that distinguish one number from another.
A model that memorises instead learns facts about particular training examples. It may effectively treat each example as a separate case rather than discovering a general principle. This strategy can produce excellent training performance because the training examples never change. However, when unfamiliar inputs appear, the memorised details provide little guidance. [Infinite Faculty]infinitefaculty.substack.comInfinite Faculty Memorization vsgeneralization in deep learning: implicit…February 18, 2026 — Overfitting matches all the training data points perfectly, but makes wo…
An everyday analogy is a student preparing for an examination. One student memorises the answers to every practice question. Another learns the method behind the questions. Both may score perfectly on the practice set, but only the second student is likely to succeed when presented with new problems. Training accuracy alone cannot distinguish between these two forms of success.
Why memorisation can look like learning
The difficulty is that both behaviours often produce the same result on the training set. A model that has genuinely learned a rule and a model that has merely memorised examples may each achieve 100% training accuracy.
From the perspective of the training data, there is no obvious difference. The distinction becomes visible only when the model encounters examples that were not available during training. This is why machine-learning evaluation depends so heavily on validation and test sets. They expose whether the model has captured a reusable pattern or simply remembered the training material. [Cross Validated]stats.stackexchange.comwhat should i do when my neural network doesnt generalize wellCross ValidatedWhat should I do when my neural network doesn't generalize…7 Sept 2018 — I'm training a neural network and the training…
How noisy details become shortcuts
One of the most common paths to hidden memorisation is shortcut learning. Instead of learning the feature that truly matters, a model discovers an easier signal that happens to correlate with the correct answer in the training data. [Communications of the ACM+2Frontiers]cacm.acm.orgshortcut learning of large language models in natural language understandingThe method synthesizes a pair…Read more…
Imagine a dataset designed to identify animals. Suppose most photographs of wolves were taken in snowy environments while most photographs of dogs were not. A model might learn to associate snow with wolves. Training accuracy could become very high because the shortcut works on the training images. Yet the model would fail when shown a wolf standing on grass.
Researchers describe these misleading signals as spurious correlations: relationships that appear useful in the training data but do not represent the true cause of the outcome. Such correlations often disappear when conditions change, leading to poor generalisation. [arXiv+2Frontiers]arxiv.orgSpurious Correlations in Machine Learning: A Survey20 Feb 2024 — Spurious correlation, namely “correlations that do not imply causat…
The problem is especially deceptive because exploiting shortcuts frequently improves training accuracy faster than learning the deeper pattern. The model is rewarded for taking the easy route, even if that route will later fail.
Noise can become part of the model
Training datasets often contain imperfections:
- Incorrect labels
- Measurement errors
- Unusual outlier examples
- Accidental correlations
A powerful model can eventually fit these irregularities as well as the genuine signal. Research on memorisation shows that neural networks are capable of learning noisy examples that contain little useful information for future predictions. When this happens, the model’s training accuracy continues to climb even though its ability to generalise may stagnate or worsen. [arXiv+2OpenReview]arxiv.orgMemorization in deep learning: A survey6 Jun 2024 — This survey offers the first-in-kind understanding of memorization in DNNs, prov…
This explains why a steadily improving training score is not always good news. Sometimes it indicates that the model has moved beyond learning the main pattern and has started encoding noise.
When perfect training accuracy should raise suspicion
Perfect training accuracy is not automatically a problem. Some well-designed models achieve it and still generalise successfully. However, perfection deserves scrutiny because it can signal that memorisation has become a major part of the learning process. [Wikipedia]WikipediaDouble descentDouble descent
Modern research has revealed surprising behaviour in large neural networks. In some situations, models first overfit and memorise before later developing more general solutions. Studies of phenomena such as grokking and double descent show that the relationship between memorisation and generalisation is more complex than older textbook explanations suggested. Even so, these findings do not eliminate the need for testing on unseen data. They reinforce it. A training score alone still cannot reveal whether the model’s apparent success comes from genuine understanding or from fitting the peculiarities of the training set. [Pair with Google+2Wikipedia]pair.withgoogle.comPair with GoogleDo Machine Learning Models Memorize or Generalize?by A Pearce · Cited by 17 — With too little weight decay, the model can…
The practical lesson is simple: high training accuracy measures success on familiar examples, not usefulness in the real world. Without evaluating new examples, there is no reliable way to tell whether a model has learned a durable rule or merely become an expert at remembering its training data. [OpenReview]openreview.netWHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m…
Amazon book picks
Further Reading
Books and field guides related to When high accuracy is the warning sign. Use these as the next step if you want deeper reading beyond the article.
Hands-on Machine Learning with Scikit-Learn, Keras, and Tenso...
Explains how models can memorise training data instead of learning patterns.
The Hundred-page Machine Learning Book
Covers memorisation, generalisation, and evaluation fundamentals.
Pattern Recognition and Machine Learning
Provides theoretical explanations for overfitting and high training accuracy.
An Introduction to Statistical Learning
Shows why excellent training scores can hide weak real-world performance.
Endnotes
-
Source: openreview.net
Link: https://openreview.net/pdf?id=vVhZh9ZpIMSource snippet
WHEN MEMORIZATION HURTS GENERALIZATIONApril 1, 2025 — by R Bayat · Cited by 19 — In this situation, memorization is ugly: The m...
Published: April 1, 2025
-
Source: arxiv.org
Link: https://arxiv.org/html/2406.03880v1Source snippet
Memorization in deep learning: A survey6 Jun 2024 — This survey offers the first-in-kind understanding of memorization in DNNs, prov...
-
Source: cacm.acm.org
Title: shortcut learning of large [language models]({{ ‘language-models/’ | relative_url }}) in natural language understanding
Link: https://cacm.acm.org/research/shortcut-learning-of-large-language-models-in-natural-language-understanding/Source snippet
The method synthesizes a pair...Read more...
-
Source: arxiv.org
Link: https://arxiv.org/html/2402.12715v1Source snippet
Spurious Correlations in Machine Learning: A Survey20 Feb 2024 — Spurious correlation, namely “correlations that do not imply causat...
-
Source: arxiv.org
Link: https://arxiv.org/html/2310.13572v3Source snippet
Unraveling the Enigma of Double Descent: An In-depth...30 Apr 2024 — In this study, we revisit the phenomenon of double descent and demo...
-
Source: Wikipedia
Title: Double descent
Link: https://en.wikipedia.org/wiki/Double_descent -
Source: Wikipedia
Title: Grokking (machine learning)
Link: https://en.wikipedia.org/wiki/Grokking_%28machine_learning%29 -
Source: arxiv.org
Link: https://arxiv.org/html/2412.05152v1Source snippet
Navigating Shortcuts, Spurious Correlations, and...6 Dec 2024 — One of the most general definitions of a spurious correlation is "a corr...
-
Source: openreview.net
Link: https://openreview.net/pdf?id=12RoR2o32TSource snippet
Shortcut learning in deep neural...Read more...
-
Source: dl.acm.org
Link: https://dl.acm.org/doi/10.1145/3769076Source snippet
in Deep Learning: A SurveyFinally, they validate the memorization effect (i.e., accuracy of noisy examples in the training dataset) on mo...
-
Source: stats.stackexchange.com
Title: what should i do when my neural network doesnt generalize well
Link: https://stats.stackexchange.com/questions/365778/what-should-i-do-when-my-neural-network-doesnt-generalize-wellSource snippet
Cross ValidatedWhat should I do when my neural network doesn't generalize...7 Sept 2018 — I'm training a neural network and the training...
-
Source: infinitefaculty.substack.com
Title: Infinite Faculty Memorization vs
Link: https://infinitefaculty.substack.com/p/memorization-vs-generalization-inSource snippet
generalization in deep learning: implicit...February 18, 2026 — Overfitting matches all the training data points perfectly, but makes wo...
Published: February 18, 2026
-
Source: frontiersin.org
Link: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1692454/fullSource snippet
Unmasking the Clever Hans effect in AI modelsby AK Pathak · 2025 · Cited by 2 — The Clever Hans effect in AI can be formalized using the...
-
Source: pair.withgoogle.com
Link: https://pair.withgoogle.com/explorables/grokking/Source snippet
Pair with GoogleDo Machine Learning Models Memorize or Generalize?by A Pearce · Cited by 17 — With too little weight decay, the model can...
Additional References
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/357412481_Deep_double_descent_where_bigger_models_and_more_data_hurtSource snippet
Deep double descent: where bigger models and more data...We show that a variety of modern deep learning tasks exhibit a 'double-descent'...
-
Source: medium.com
Link: https://medium.com/aimonks/striking-a-balance-navigating-memorization-and-generalization-in-deep-learning-30f5da3f07cbSource snippet
igher than test accuracy, the model may be overfitting.Read more...
-
Source: youtube.com
Title: Quantifying and Understanding Memorization in Deep Neural
Link: https://www.youtube.com/watch?v=Ohl5AGUOLXkSource snippet
Google Abstract: Deep learning algorithms are well-known to have a propensity for fitting the training data very well and memorize... ge...
-
Source: facebook.com
Link: https://www.facebook.com/groups/3670562573177653/posts/3936796146554293/Source snippet
ning and test losses ・ successful deep artificial neural...
-
Source: alignmentforum.org
Title: understanding deep double descent
Link: https://www.alignmentforum.org/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descentSource snippet
Understanding “Deep Double Descent”5 Dec 2019 — Double descent is a puzzling phenomenon in machine learning where increasing model size/t...
-
Source: stat.berkeley.edu
Title: 12369 Mitigating Memorization
Link: https://www.stat.berkeley.edu/~mmahoney/pubs/12369Mitigating_Memorization.pdfSource snippet
MEMORIZATION IN LANGUAGE MODELSby M Sakarvadia · Cited by 19 — Thus we propose a computationally efficient suite of GPT2- style models, T...
-
Source: arjunahuja.medium.com
Title: double descent 8f92dfdc442f
Link: https://arjunahuja.medium.com/double-descent-8f92dfdc442fSource snippet
Descent. Breakthroughs in Machine Learning are…Yet practitioners routinely use larger and larger neural networks to improve test set accu...
-
Source: youtu.be
Link: https://youtu.be/XL07WEc2TRISource snippet
"Amazing lecture by Ilya Sutkever at MIT: [https://youtu.be/9EN_HoEk3KY..."](https://youtu.be/9EN_HoEk3KY...")...
-
Source: stpn.bearblog.dev
Title: reproducing double descent
Link: https://stpn.bearblog.dev/reproducing-double-descent/Source snippet
deep double descent3 Jun 2025 — The phrasing double descent refers to this behavior where error gets better at first, then peaks much wor...
-
Source: transformer-circuits.pub
Link: https://transformer-circuits.pub/2023/toy-double-descent/index.htmlSource snippet
Superposition, Memorization, and Double Descent5 Jan 2023 — We observe double descent Reconciling modern machine-learning practice the Re...
Topic Tree



