Within Machine Learning
When a model memorises instead of learning
Overfitting explains why a model can look excellent in training but fail when it meets cases it has never seen before.
On this page
- Why training performance can be misleading
- How test sets reveal generalisation
- Ways teams reduce overfitting
Page outline Jump by section
Introduction
A machine-learning model is judged by what it does with data it has never seen before. A model that achieves near-perfect results on its training examples may still be a poor AI system if its performance collapses when faced with new cases. This problem is known as overfitting: the model has learned the training data too closely, including quirks, noise, and accidental patterns that do not hold more generally. The central challenge is therefore not remembering past examples but generalising—making accurate predictions on unfamiliar data. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…
When a model looks brilliant but is actually failing
Why training performance can be misleading
During training, a model repeatedly adjusts itself to reduce errors on the examples it is shown. If developers look only at training results, the model may appear increasingly successful. The danger is that the model can start learning details that are specific to the training set rather than learning the underlying pattern that generated the data. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…
Imagine a system learning to identify healthy and diseased plants. A genuinely useful model would discover features related to disease. An overfitted model might instead learn irrelevant details that happened to appear in the training photographs, such as lighting conditions or camera angles. It would score highly on familiar images but perform poorly when confronted with new photographs. This is why strong training accuracy alone is not evidence that a model has learned the right lesson. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m…
A common misconception is that more accurate fitting is always better. In reality, a model can become so flexible that it begins to match random fluctuations in the data. The model is then effectively memorising examples instead of extracting a rule that applies beyond them. Researchers often describe this as a high-variance model because small changes in the training data can lead to substantially different learned behaviour. [ApX Machine Learning]apxml.comIt captures not only the underlying patterns but also the noise and random fluctuations specific to the…Read more…
A simple example of memorisation
Suppose a teacher wants students to understand arithmetic. A student who memorises the answers to fifty practice questions may achieve a perfect score on those exact questions. However, if the examination contains different numbers, memorisation offers little help. A student who learned the underlying method performs better on unfamiliar problems.
Machine learning faces the same challenge. The goal is not to reproduce training examples but to learn a pattern that continues to work when circumstances change. This ability is called generalisation, and it is one of the most important measures of machine-learning quality. [Google for Developers]developers.google.comGoogle for DevelopersGeneralization | Machine LearningAug 25, 2025 — Learn about the machine learning concept of generalization: ensuring…
How test sets reveal generalisation
Because training results can be deceptive, machine-learning teams reserve some data that the model never sees during training. This separate collection of examples is commonly called a test set. The model is evaluated on these new examples only after training is complete. [Real Python]realpython.comReal PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis…
The logic is straightforward. If a model performs well on both the training data and the unseen test data, it has probably learned something useful. If training performance is excellent but test performance is much worse, overfitting is likely. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…
Many projects also use a validation set, which acts as an intermediate checkpoint during development. Developers use validation results to compare model designs and tune settings, while the test set remains untouched until the final evaluation. This separation helps prevent accidental leakage of information from the test data into the training process. [Unidata]unidata.proValidation Dataset in Machine Learning - Unidata13 Sept 2024 — Validation set (~15%) — used during training to monitor performance…
One of the clearest warning signs appears when training error keeps falling while validation or test error stops improving and begins to rise. The model is becoming increasingly specialised to the training examples while losing its ability to handle new ones. Google’s machine-learning guidance identifies this divergence between training and validation performance as a characteristic signal of overfitting. [Google for Developers+2Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…
Why unseen examples matter so much
The real world constantly presents situations that differ from past data. Emails contain new wording, customers change behaviour, medical images come from different equipment, and road conditions vary from those in a training dataset.
A model that only succeeds on familiar examples has little practical value. The usefulness of AI systems depends on their ability to transfer what they learned from past data to future situations. IBM describes generalisation as the key reason machine learning can be used for prediction and classification tasks in everyday applications. When overfitting occurs, that practical value is undermined because performance becomes tied too closely to historical examples. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…
This is why benchmark results are usually reported on held-out test data rather than training data. The test score provides a more realistic estimate of how the model is likely to behave outside the laboratory. [Real Python]realpython.comReal PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis…
Ways teams reduce overfitting
Overfitting cannot always be eliminated, but several widely used techniques help reduce it.
Use more representative data. Models tend to generalise better when training examples capture the diversity of real-world situations. Narrow or unrepresentative datasets make it easier for models to learn misleading shortcuts. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…
Limit unnecessary complexity. Extremely complex models can fit tiny details in the training data. In many cases, a simpler model performs better on unseen examples because it focuses on broader patterns rather than noise. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m…
Apply regularisation. Regularisation techniques deliberately discourage excessive complexity during training. They introduce penalties that push the model towards simpler explanations, improving its ability to generalise. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: L2 regularization | Machine LearningDec 3, 2025 — Learn how the L2 regularization metric is calculated…
Use validation and cross-validation. Repeated evaluation on separate data helps detect whether performance gains are genuine or merely the result of memorising the training set. Cross-validation, which rotates different portions of data through training and testing roles, provides a more robust estimate of generalisation performance. [Scikit-learn]scikit-learn.orgUnderfitting vs. OverfittingWe evaluate quantitatively overfitting / underfitting by using cross-validation. We calculate the…
Stop training at the right time. If validation performance begins to worsen while training performance continues to improve, developers may halt training before the model becomes excessively specialised. Monitoring these curves is a standard defence against overfitting. [Google for Developers]developers.google.cominterpreting loss curvesEnsure that the training set and test set are statistically equivalent. The learning rate is too high. If the…Read more…
The key lesson
Overfitting exposes a fundamental truth about artificial intelligence: success on past examples is not the same as understanding a task. A model can appear impressive when judged by the data it has already seen yet fail when confronted with genuinely new situations. Test sets, validation procedures, and other evaluation methods exist to answer the question that matters most: not “How well did the model remember?” but “How well does it perform on the next example?” [Google for Developers+2Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…
Amazon book picks
Further Reading
Books and field guides related to When a model memorises instead of learning. Use these as the next step if you want deeper reading beyond the article.
Hands-on Machine Learning with Scikit-Learn, Keras, and Tenso...
Covers overfitting, validation, test sets, regularisation, and generalisation in depth.
Pattern Recognition and Machine Learning
Explains the theoretical foundations of overfitting and model generalisation.
An Introduction to Statistical Learning
Strong treatment of bias-variance trade-offs and overfitting.
The Hundred-page Machine Learning Book
Provides an accessible explanation of generalisation and model evaluation.
Endnotes
-
Source: ibm.com
Link: https://www.ibm.com/think/topics/overfittingSource snippet
What is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/overfittingSource snippet
Google for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/model-complexitySource snippet
Google for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m...
-
Source: ibm.com
Link: https://www.ibm.com/think/topics/overfitting-vs-underfittingSource snippet
to memorization instead of generalization.Read more...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/generalizationSource snippet
Google for DevelopersGeneralization | Machine LearningAug 25, 2025 — Learn about the machine learning concept of generalization: ensuring...
-
Source: docs.cloud.google.com
Link: https://docs.cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutionsSource snippet
Google Cloud DocumentationGuidelines for developing high-quality, predictive ML...Jul 8, 2024 — Compare your model performance on the tr...
-
Source: scikit-learn.org
Title: learning curve
Link: https://scikit-learn.org/stable/modules/learning_curve.htmlSource snippet
Validation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over...
-
Source: unidata.pro
Link: https://unidata.pro/blog/validation-dataset-in-ml/Source snippet
Validation Dataset in Machine Learning - Unidata13 Sept 2024 — Validation set (~15%) — used during training to monitor performance...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/[productionSource snippet
ML systems: Monitoring pipelines16 Oct 2025 — You partition the data carefully, ensuring that your training set is well isolated from you...
-
Source: developers.google.com
Title: interpreting loss curves
Link: https://developers.google.com/machine-learning/crash-course/overfitting/interpreting-loss-curvesSource snippet
Ensure that the training set and test set are statistically equivalent. The learning rate is too high. If the...Read more...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/glossary/fundamentalsSource snippet
For example, the following generalization curve suggests overfitting because validation loss...Read more...
-
Source: ibm.com
Link: https://www.ibm.com/think/topics/model-selectionSource snippet
Model Selection in Machine LearningOverfitting means that the model adapts too closely to the training set and cannot generalize to new d...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/regularizationSource snippet
Google for DevelopersOverfitting: L2 regularization | Machine LearningDec 3, 2025 — Learn how the L2 regularization metric is calculated...
-
Source: developers.google.com
Title: Incorrect predictions. Learning rate. Complexity.Read more
Link: https://developers.google.com/machine-learning/crash-course/overfitting/quizSource snippet
Google for DevelopersDatasets, generalization, and overfitting: Test Your...Regularization improves your model's ability to generalize t...
-
Source: scikit-learn.org
Link: https://scikit-learn.org/0.18/auto_examples/model_selection/plot_underfitting_overfitting.htmlSource snippet
Underfitting vs. OverfittingWe evaluate quantitatively overfitting / underfitting by using cross-validation. We calculate the...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfittingSource snippet
google.comDatasets, generalization, and overfitting | Machine LearningDec 3, 2025 — In this module, you'll learn more about the character...
-
Source: developers.google.com
Title: overfitting and pruning
Link: https://developers.google.com/machine-learning/decision-forests/overfitting-and-pruningSource snippet
and pruningAug 25, 2025 — You can disable pruning with the validation dataset by setting validation_ratio=0.0. Those criteria introduce...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/fairnessSource snippet
Machine LearningAug 25, 2025 — This module looks at different types of human biases that can manifest in training data. It then provide...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/llmSource snippet
to Large [Language Models]({{ 'language-models/' | relative_url }}) | Machine LearningJan 9, 2026 — This course module provides an overview of language models and large language mo...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/guides/text-classification/step-4Source snippet
4: Build, Train, and Evaluate Your ModelAug 25, 2025 — In this section, we will work towards building, training and evaluating our model...
-
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/automlSource snippet
Machine Learning (AutoML)Aug 25, 2025 — This course module teaches best practices for using automated machine learning (AutoML) tools in...
-
Source: developers.google.com
Title: dividing datasets
Link: https://developers.google.com/machine-learning/crash-course/overfitting/dividing-datasetsSource snippet
the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into training, validation, and test s...
-
Source: developers.google.com
Title: imbalanced datasets
Link: https://developers.google.com/machine-learning/crash-course/overfitting/imbalanced-datasetsSource snippet
google.comClass-imbalanced datasets | Machine LearningAug 28, 2025 — Learn how to overcome problems with training imbalanced datasets by...
-
Source: developers.google.com
Title: linear regression
Link: https://developers.google.com/machine-learning/crash-course/linear-regressionSource snippet
regression | Machine LearningDec 9, 2025 — This course module teaches the fundamentals of linear regression, including linear equations...
-
Source: ibm.com
Link: https://www.ibm.com/Source snippet
For more than a century, IBM has been a global technology innovator, leading advances in AI, [automation]({{ 'automation-bias/' | relative_url }}) and hybrid cloud solutions tha...
-
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.htmlSource snippet
Underfitting vs. OverfittingThis example demonstrates the problems of underfitting and overfitting and how we can use linear regression w...
-
Source: scikit-learn.org
Link: https://scikit-learn.org/Source snippet
machine learning in Python — scikit-learn 1.9.0...Machine Learning in Python · Simple and efficient tools for predictive d...
-
Source: scikit-learn.org
Title: Underfitting vs
Link: https://scikit-learn.org/1.3/auto_examples/model_selection/plot_underfitting_overfitting.htmlSource snippet
Overfitting — scikit-learn 1.3.2 documentationThis example demonstrates the problems of underfitting and overfitting and how we can use l...
-
Source: scikit-learn.org
Link: https://scikit-learn.org/0.15/auto_examples/plot_underfitting_overfitting.htmlSource snippet
Underfitting vs. OverfittingThis example demonstrates the problems of underfitting and overfitting and how we can use linear regression w...
-
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/auto_examples/index.htmlSource snippet
Some examples demonstrate the use of the API in general and some demonstrate...Read more...
-
Source: apxml.com
Link: https://apxml.com/courses/getting-started-with-scikit-learn/chapter-5-model-selection-evaluation/overfitting-underfittingSource snippet
It captures not only the underlying patterns but also the noise and random fluctuations specific to the...Read more...
-
Source: realpython.com
Link: https://realpython.com/train-test-split-python-data/Source snippet
Real PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis...
-
Source: aws.amazon.com
Link: https://aws.amazon.com/what-is/overfitting/Source snippet
in Machine Learning ExplainedOverfitting is an undesirable machine learning behavior that occurs when the machine learning model gives ac...
-
Source: stackoverflow.com
Link: https://stackoverflow.com/questions/20357705/scikit-learn-cross-validation-over-fitting-or-under-fitting -
Source: coursera.org
Link: https://www.coursera.org/articles/overfittingSource snippet
Data: A Beginner's Guide23 Oct 2025 — Overfitting is a type of machine learning behavior where the machine learning model is accurate for...
-
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/OverfittingSource snippet
OverfittingIn mathematical modeling, overfitting is the production of an analysis that corresponds too closely or exactly to a particu...
-
Source: github.com
Link: https://github.com/cstjean/ScikitLearn.jl/blob/master/examples/Underfitting_vs_Overfitting.ipynbSource snippet
ScikitLearn.jl/examples/Underfitting_vs_Overfitting.ipynb at...This example demonstrates the problems of underfitting and overfitting an...
-
Source: scipy-lectures.org
Link: https://scipy-lectures.org/packages/scikit-learn/index.htmlSource snippet
scikit-learn: machine learning in PythonAgain, this is an example of fitting a model to data, but our focus here is that the model can ma...
Additional References
-
Source: medium.com
Link: https://medium.com/%40boutnaru/artificial-intelligence-overfitting-vs-underfitting-66d83d38b1a0Source snippet
Artificial Intelligence — Overfitting vs UnderfittingOverfitting is a case in which a machine learning model learns the training data too...
-
Source: inria.github.io
Link: https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.htmlSource snippet
Overfit-generalization-underfit — Scikit-learn courseIn this notebook, we put these two errors into perspective and show how they can hel...
-
Source: kaggle.com
Link: https://www.kaggle.com/code/dansbecker/underfitting-and-overfittingSource snippet
Underfitting and OverfittingThis is a phenomenon called overfitting, where a model matches the training data almost perfectly, but does p...
-
Source: inria.github.io
Link: https://inria.github.io/scikit-learn-mooc/overfit/overfitting_vs_under_fitting_slides.htmlSource snippet
🎥 Overfitting and Underfitting — Scikit-learn courseOverfitting and underfitting. Understand when and why a model does or does not genera...
-
Source: medium.com
Link: https://medium.com/%40kocyigit.emre/machine-learning-challenges-6-overfitting-a1d46869803f -
Source: ai.stackexchange.com
Title: Consider a noisy 2d dataset where I am fitting polynomials. A good model would
Link: https://ai.stackexchange.com/questions/43298/why-does-model-overfitting-lead-to-poor-generalizationSource snippet
does model overfitting lead to poor generalization?2 Jan 2024 — If a model overfit to the training data, why does it generalize poorly?...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=tJBfN577mhASource snippet
AI Concepts: Overfitting, Underfitting and GeneralisationOverfitting and underfitting are basically problems that prevent model from gene...
-
Source: youtu.be
Link: https://youtu.be/E3_408q1mjoSource snippet
Gradient Boosting with Regression Trees Explained: [https://youtu.be/lOwsMpdjxog](https://youtu.be/lOwsMpdjxog) P-Values Explained: [https://youtu.be/IZUfbRvsZ9w](https://youtu.be/IZUfbRvsZ9w) Kabsch-U...
-
Source: biztechmagazine.com
Title: what overfitting machine learning and how can it be prevented perfcon
Link: https://biztechmagazine.com/article/2022/03/what-overfitting-machine-learning-and-how-can-it-be-prevented-perfconSource snippet
Overfitting in Machine Learning: What is it & How Can It Be...16 Mar 2022 — Overfitting refers to the use of a data set that is too clos...
-
Source: youtu.be
Link: https://youtu.be/IZUfbRvsZ9wSource snippet
Kabsch-Umeyama Algorithm: [https://youtu.be/nCs_e6fP7Jo](https://youtu.be/nCs_e6fP7Jo) Eigendecomposition Explained: [https://youtu.be/ihUr2LbdYlE](https://youtu.be/ihUr2LbdYlE) Covariance Matrix Expla...
Topic Tree


