When a model memorises instead of learning

Introduction

A machine-learning model is judged by what it does with data it has never seen before. A model that achieves near-perfect results on its training examples may still be a poor AI system if its performance collapses when faced with new cases. This problem is known as overfitting: the model has learned the training data too closely, including quirks, noise, and accidental patterns that do not hold more generally. The central challenge is therefore not remembering past examples but generalising—making accurate predictions on unfamiliar data. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Overfitting illustration 1

When a model looks brilliant but is actually failing

Why training performance can be misleading

During training, a model repeatedly adjusts itself to reduce errors on the examples it is shown. If developers look only at training results, the model may appear increasingly successful. The danger is that the model can start learning details that are specific to the training set rather than learning the underlying pattern that generated the data. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Imagine a system learning to identify healthy and diseased plants. A genuinely useful model would discover features related to disease. An overfitted model might instead learn irrelevant details that happened to appear in the training photographs, such as lighting conditions or camera angles. It would score highly on familiar images but perform poorly when confronted with new photographs. This is why strong training accuracy alone is not evidence that a model has learned the right lesson. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m…

A common misconception is that more accurate fitting is always better. In reality, a model can become so flexible that it begins to match random fluctuations in the data. The model is then effectively memorising examples instead of extracting a rule that applies beyond them. Researchers often describe this as a high-variance model because small changes in the training data can lead to substantially different learned behaviour. [ApX Machine Learning]apxml.comIt captures not only the underlying patterns but also the noise and random fluctuations specific to the…Read more…

A simple example of memorisation

Suppose a teacher wants students to understand arithmetic. A student who memorises the answers to fifty practice questions may achieve a perfect score on those exact questions. However, if the examination contains different numbers, memorisation offers little help. A student who learned the underlying method performs better on unfamiliar problems.

Machine learning faces the same challenge. The goal is not to reproduce training examples but to learn a pattern that continues to work when circumstances change. This ability is called generalisation, and it is one of the most important measures of machine-learning quality. [Google for Developers]developers.google.comGoogle for DevelopersGeneralization | Machine LearningAug 25, 2025 — Learn about the machine learning concept of generalization: ensuring…

How test sets reveal generalisation

Because training results can be deceptive, machine-learning teams reserve some data that the model never sees during training. This separate collection of examples is commonly called a test set. The model is evaluated on these new examples only after training is complete. [Real Python]realpython.comReal PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis…

The logic is straightforward. If a model performs well on both the training data and the unseen test data, it has probably learned something useful. If training performance is excellent but test performance is much worse, overfitting is likely. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…

Many projects also use a validation set, which acts as an intermediate checkpoint during development. Developers use validation results to compare model designs and tune settings, while the test set remains untouched until the final evaluation. This separation helps prevent accidental leakage of information from the test data into the training process. [Unidata]unidata.proValidation Dataset in Machine Learning - Unidata13 Sept 2024 — Validation set (~15%) — used during training to monitor performance…

One of the clearest warning signs appears when training error keeps falling while validation or test error stops improving and begins to rise. The model is becoming increasingly specialised to the training examples while losing its ability to handle new ones. Google’s machine-learning guidance identifies this divergence between training and validation performance as a characteristic signal of overfitting. [Google for Developers+2Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…

Overfitting illustration 2

Why unseen examples matter so much

The real world constantly presents situations that differ from past data. Emails contain new wording, customers change behaviour, medical images come from different equipment, and road conditions vary from those in a training dataset.

A model that only succeeds on familiar examples has little practical value. The usefulness of AI systems depends on their ability to transfer what they learned from past data to future situations. IBM describes generalisation as the key reason machine learning can be used for prediction and classification tasks in everyday applications. When overfitting occurs, that practical value is undermined because performance becomes tied too closely to historical examples. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

This is why benchmark results are usually reported on held-out test data rather than training data. The test score provides a more realistic estimate of how the model is likely to behave outside the laboratory. [Real Python]realpython.comReal PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis…

Ways teams reduce overfitting

Overfitting cannot always be eliminated, but several widely used techniques help reduce it.

Use more representative data. Models tend to generalise better when training examples capture the diversity of real-world situations. Narrow or unrepresentative datasets make it easier for models to learn misleading shortcuts. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…

Limit unnecessary complexity. Extremely complex models can fit tiny details in the training data. In many cases, a simpler model performs better on unseen examples because it focuses on broader patterns rather than noise. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m…

Apply regularisation. Regularisation techniques deliberately discourage excessive complexity during training. They introduce penalties that push the model towards simpler explanations, improving its ability to generalise. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: L2 regularization | Machine LearningDec 3, 2025 — Learn how the L2 regularization metric is calculated…

Use validation and cross-validation. Repeated evaluation on separate data helps detect whether performance gains are genuine or merely the result of memorising the training set. Cross-validation, which rotates different portions of data through training and testing roles, provides a more robust estimate of generalisation performance. [Scikit-learn]scikit-learn.orgUnderfitting vs. OverfittingWe evaluate quantitatively overfitting / underfitting by using cross-validation. We calculate the…

Stop training at the right time. If validation performance begins to worsen while training performance continues to improve, developers may halt training before the model becomes excessively specialised. Monitoring these curves is a standard defence against overfitting. [Google for Developers]developers.google.cominterpreting loss curvesEnsure that the training set and test set are statistically equivalent. The learning rate is too high. If the…Read more…

Overfitting illustration 3

The key lesson

Overfitting exposes a fundamental truth about artificial intelligence: success on past examples is not the same as understanding a task. A model can appear impressive when judged by the data it has already seen yet fail when confronted with genuinely new situations. Test sets, validation procedures, and other evaluation methods exist to answer the question that matters most: not “How well did the model remember?” but “How well does it perform on the next example?” [Google for Developers+2Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

Machine Learning Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: machine learning wall art

Browse similar on eBay.co.uk

Example eBay listing

Machine Learning Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: machine learning wall art

Browse similar on eBay.co.uk

Example eBay listing

Anti AI Anti Machine Learning Say N Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: machine learning wall art

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: ibm.com
Link: https://www.ibm.com/think/topics/overfitting
Source snippet
What is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/overfitting
Source snippet
Google for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/model-complexity
Source snippet
Google for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m...
Source: ibm.com
Link: https://www.ibm.com/think/topics/overfitting-vs-underfitting
Source snippet
to memorization instead of generalization.Read more...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/generalization
Source snippet
Google for DevelopersGeneralization | Machine LearningAug 25, 2025 — Learn about the machine learning concept of generalization: ensuring...
Source: docs.cloud.google.com
Link: https://docs.cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutions
Source snippet
Google Cloud DocumentationGuidelines for developing high-quality, predictive ML...Jul 8, 2024 — Compare your model performance on the tr...
Source: scikit-learn.org
Title: learning curve
Link: https://scikit-learn.org/stable/modules/learning_curve.html
Source snippet
Validation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over...
Source: unidata.pro
Link: https://unidata.pro/blog/validation-dataset-in-ml/
Source snippet
Validation Dataset in Machine Learning - Unidata13 Sept 2024 — Validation set (~15%) — used during training to monitor performance...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/[production
Source snippet
ML systems: Monitoring pipelines16 Oct 2025 — You partition the data carefully, ensuring that your training set is well isolated from you...
Source: developers.google.com
Title: interpreting loss curves
Link: https://developers.google.com/machine-learning/crash-course/overfitting/interpreting-loss-curves
Source snippet
Ensure that the training set and test set are statistically equivalent. The learning rate is too high. If the...Read more...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/glossary/fundamentals
Source snippet
For example, the following generalization curve suggests overfitting because validation loss...Read more...
Source: ibm.com
Link: https://www.ibm.com/think/topics/model-selection
Source snippet
Model Selection in Machine LearningOverfitting means that the model adapts too closely to the training set and cannot generalize to new d...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/regularization
Source snippet
Google for DevelopersOverfitting: L2 regularization | Machine LearningDec 3, 2025 — Learn how the L2 regularization metric is calculated...
Source: developers.google.com
Title: Incorrect predictions. Learning rate. Complexity.Read more
Link: https://developers.google.com/machine-learning/crash-course/overfitting/quiz
Source snippet
Google for DevelopersDatasets, generalization, and overfitting: Test Your...Regularization improves your model's ability to generalize t...
Source: scikit-learn.org
Link: https://scikit-learn.org/0.18/auto_examples/model_selection/plot_underfitting_overfitting.html
Source snippet
Underfitting vs. OverfittingWe evaluate quantitatively overfitting / underfitting by using cross-validation. We calculate the...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting
Source snippet
google.comDatasets, generalization, and overfitting | Machine LearningDec 3, 2025 — In this module, you'll learn more about the character...
Source: developers.google.com
Title: overfitting and pruning
Link: https://developers.google.com/machine-learning/decision-forests/overfitting-and-pruning
Source snippet
and pruningAug 25, 2025 — You can disable pruning with the validation dataset by setting validation_ratio=0.0. Those criteria introduce...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/fairness
Source snippet
Machine LearningAug 25, 2025 — This module looks at different types of human biases that can manifest in training data. It then provide...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/llm
Source snippet
to Large [Language Models]({{ 'language-models/' | relative_url }}) | Machine LearningJan 9, 2026 — This course module provides an overview of language models and large language mo...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/guides/text-classification/step-4
Source snippet
4: Build, Train, and Evaluate Your ModelAug 25, 2025 — In this section, we will work towards building, training and evaluating our model...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/automl
Source snippet
Machine Learning (AutoML)Aug 25, 2025 — This course module teaches best practices for using automated machine learning (AutoML) tools in...
Source: developers.google.com
Title: dividing datasets
Link: https://developers.google.com/machine-learning/crash-course/overfitting/dividing-datasets
Source snippet
the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into training, validation, and test s...
Source: developers.google.com
Title: imbalanced datasets
Link: https://developers.google.com/machine-learning/crash-course/overfitting/imbalanced-datasets
Source snippet
google.comClass-imbalanced datasets | Machine LearningAug 28, 2025 — Learn how to overcome problems with training imbalanced datasets by...
Source: developers.google.com
Title: linear regression
Link: https://developers.google.com/machine-learning/crash-course/linear-regression
Source snippet
regression | Machine LearningDec 9, 2025 — This course module teaches the fundamentals of linear regression, including linear equations...
Source: ibm.com
Link: https://www.ibm.com/
Source snippet
For more than a century, IBM has been a global technology innovator, leading advances in AI, [automation]({{ 'automation-bias/' | relative_url }}) and hybrid cloud solutions tha...
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
Source snippet
Underfitting vs. OverfittingThis example demonstrates the problems of underfitting and overfitting and how we can use linear regression w...
Source: scikit-learn.org
Link: https://scikit-learn.org/
Source snippet
machine learning in Python — scikit-learn 1.9.0...Machine Learning in Python · Simple and efficient tools for predictive d...
Source: scikit-learn.org
Title: Underfitting vs
Link: https://scikit-learn.org/1.3/auto_examples/model_selection/plot_underfitting_overfitting.html
Source snippet
Overfitting — scikit-learn 1.3.2 documentationThis example demonstrates the problems of underfitting and overfitting and how we can use l...
Source: scikit-learn.org
Link: https://scikit-learn.org/0.15/auto_examples/plot_underfitting_overfitting.html
Source snippet
Underfitting vs. OverfittingThis example demonstrates the problems of underfitting and overfitting and how we can use linear regression w...
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/auto_examples/index.html
Source snippet
Some examples demonstrate the use of the API in general and some demonstrate...Read more...
Source: apxml.com
Link: https://apxml.com/courses/getting-started-with-scikit-learn/chapter-5-model-selection-evaluation/overfitting-underfitting
Source snippet
It captures not only the underlying patterns but also the noise and random fluctuations specific to the...Read more...
Source: realpython.com
Link: https://realpython.com/train-test-split-python-data/
Source snippet
Real PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis...
Source: aws.amazon.com
Link: https://aws.amazon.com/what-is/overfitting/
Source snippet
in Machine Learning ExplainedOverfitting is an undesirable machine learning behavior that occurs when the machine learning model gives ac...
Source: stackoverflow.com
Link: https://stackoverflow.com/questions/20357705/scikit-learn-cross-validation-over-fitting-or-under-fitting
Source: coursera.org
Link: https://www.coursera.org/articles/overfitting
Source snippet
Data: A Beginner's Guide23 Oct 2025 — Overfitting is a type of machine learning behavior where the machine learning model is accurate for...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Overfitting
Source snippet
OverfittingIn mathematical modeling, overfitting is the production of an analysis that corresponds too closely or exactly to a particu...
Source: github.com
Link: https://github.com/cstjean/ScikitLearn.jl/blob/master/examples/Underfitting_vs_Overfitting.ipynb
Source snippet
ScikitLearn.jl/examples/Underfitting_vs_Overfitting.ipynb at...This example demonstrates the problems of underfitting and overfitting an...
Source: scipy-lectures.org
Link: https://scipy-lectures.org/packages/scikit-learn/index.html
Source snippet
scikit-learn: machine learning in PythonAgain, this is an example of fitting a model to data, but our focus here is that the model can ma...

Additional References

Source: medium.com
Link: https://medium.com/%40boutnaru/artificial-intelligence-overfitting-vs-underfitting-66d83d38b1a0
Source snippet
Artificial Intelligence — Overfitting vs UnderfittingOverfitting is a case in which a machine learning model learns the training data too...
Source: inria.github.io
Link: https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.html
Source snippet
Overfit-generalization-underfit — Scikit-learn courseIn this notebook, we put these two errors into perspective and show how they can hel...
Source: kaggle.com
Link: https://www.kaggle.com/code/dansbecker/underfitting-and-overfitting
Source snippet
Underfitting and OverfittingThis is a phenomenon called overfitting, where a model matches the training data almost perfectly, but does p...
Source: inria.github.io
Link: https://inria.github.io/scikit-learn-mooc/overfit/overfitting_vs_under_fitting_slides.html
Source snippet
🎥 Overfitting and Underfitting — Scikit-learn courseOverfitting and underfitting. Understand when and why a model does or does not genera...
Source: medium.com
Link: https://medium.com/%40kocyigit.emre/machine-learning-challenges-6-overfitting-a1d46869803f
Source: ai.stackexchange.com
Title: Consider a noisy 2d dataset where I am fitting polynomials. A good model would
Link: https://ai.stackexchange.com/questions/43298/why-does-model-overfitting-lead-to-poor-generalization
Source snippet
does model overfitting lead to poor generalization?2 Jan 2024 — If a model overfit to the training data, why does it generalize poorly?...
Source: youtube.com
Link: https://www.youtube.com/watch?v=tJBfN577mhA
Source snippet
AI Concepts: Overfitting, Underfitting and GeneralisationOverfitting and underfitting are basically problems that prevent model from gene...
Source: youtu.be
Link: https://youtu.be/E3_408q1mjo
Source snippet
Gradient Boosting with Regression Trees Explained: [https://youtu.be/lOwsMpdjxog](https://youtu.be/lOwsMpdjxog) P-Values Explained: [https://youtu.be/IZUfbRvsZ9w](https://youtu.be/IZUfbRvsZ9w) Kabsch-U...
Source: biztechmagazine.com
Title: what overfitting machine learning and how can it be prevented perfcon
Link: https://biztechmagazine.com/article/2022/03/what-overfitting-machine-learning-and-how-can-it-be-prevented-perfcon
Source snippet
Overfitting in Machine Learning: What is it & How Can It Be...16 Mar 2022 — Overfitting refers to the use of a data set that is too clos...
Source: youtu.be
Link: https://youtu.be/IZUfbRvsZ9w
Source snippet
Kabsch-Umeyama Algorithm: [https://youtu.be/nCs_e6fP7Jo](https://youtu.be/nCs_e6fP7Jo) Eigendecomposition Explained: [https://youtu.be/ihUr2LbdYlE](https://youtu.be/ihUr2LbdYlE) Covariance Matrix Expla...

When a model memorises instead of learning

Introduction

When a model looks brilliant but is actually failing

Why training performance can be misleading

A simple example of memorisation

How test sets reveal generalisation

Why unseen examples matter so much

Ways teams reduce overfitting

The key lesson

Further Reading

Hands-on Machine Learning with Scikit-Learn, Keras, and Tenso...

Pattern Recognition and Machine Learning

An Introduction to Statistical Learning

The Hundred-page Machine Learning Book

Marketplace Samples

Machine Learning Framed Wall Art Poster Canvas Print Picture

Machine Learning Framed Wall Art Poster Canvas Print Picture

Anti AI Anti Machine Learning Say N Framed Wall Art Poster Canvas Print Picture

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 4

More on this topic 3