Within Machine Learning

When a model memorises instead of learning

Overfitting explains why a model can look excellent in training but fail when it meets cases it has never seen before.

On this page

  • Why training performance can be misleading
  • How test sets reveal generalisation
  • Ways teams reduce overfitting
Preview for When a model memorises instead of learning

Introduction

A machine-learning model is judged by what it does with data it has never seen before. A model that achieves near-perfect results on its training examples may still be a poor AI system if its performance collapses when faced with new cases. This problem is known as overfitting: the model has learned the training data too closely, including quirks, noise, and accidental patterns that do not hold more generally. The central challenge is therefore not remembering past examples but generalising—making accurate predictions on unfamiliar data. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Overfitting illustration 1

When a model looks brilliant but is actually failing

Why training performance can be misleading

During training, a model repeatedly adjusts itself to reduce errors on the examples it is shown. If developers look only at training results, the model may appear increasingly successful. The danger is that the model can start learning details that are specific to the training set rather than learning the underlying pattern that generated the data. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Imagine a system learning to identify healthy and diseased plants. A genuinely useful model would discover features related to disease. An overfitted model might instead learn irrelevant details that happened to appear in the training photographs, such as lighting conditions or camera angles. It would score highly on familiar images but perform poorly when confronted with new photographs. This is why strong training accuracy alone is not evidence that a model has learned the right lesson. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m…

A common misconception is that more accurate fitting is always better. In reality, a model can become so flexible that it begins to match random fluctuations in the data. The model is then effectively memorising examples instead of extracting a rule that applies beyond them. Researchers often describe this as a high-variance model because small changes in the training data can lead to substantially different learned behaviour. [ApX Machine Learning]apxml.comIt captures not only the underlying patterns but also the noise and random fluctuations specific to the…Read more…

A simple example of memorisation

Suppose a teacher wants students to understand arithmetic. A student who memorises the answers to fifty practice questions may achieve a perfect score on those exact questions. However, if the examination contains different numbers, memorisation offers little help. A student who learned the underlying method performs better on unfamiliar problems.

Machine learning faces the same challenge. The goal is not to reproduce training examples but to learn a pattern that continues to work when circumstances change. This ability is called generalisation, and it is one of the most important measures of machine-learning quality. [Google for Developers]developers.google.comGoogle for DevelopersGeneralization | Machine LearningAug 25, 2025 — Learn about the machine learning concept of generalization: ensuring…

How test sets reveal generalisation

Because training results can be deceptive, machine-learning teams reserve some data that the model never sees during training. This separate collection of examples is commonly called a test set. The model is evaluated on these new examples only after training is complete. [Real Python]realpython.comReal PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis…

The logic is straightforward. If a model performs well on both the training data and the unseen test data, it has probably learned something useful. If training performance is excellent but test performance is much worse, overfitting is likely. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…

Many projects also use a validation set, which acts as an intermediate checkpoint during development. Developers use validation results to compare model designs and tune settings, while the test set remains untouched until the final evaluation. This separation helps prevent accidental leakage of information from the test data into the training process. [Unidata]unidata.proValidation Dataset in Machine Learning - Unidata13 Sept 2024 — Validation set (~15%) — used during training to monitor performance…

One of the clearest warning signs appears when training error keeps falling while validation or test error stops improving and begins to rise. The model is becoming increasingly specialised to the training examples while losing its ability to handle new ones. Google’s machine-learning guidance identifies this divergence between training and validation performance as a characteristic signal of overfitting. [Google for Developers+2Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…

Overfitting illustration 2

Why unseen examples matter so much

The real world constantly presents situations that differ from past data. Emails contain new wording, customers change behaviour, medical images come from different equipment, and road conditions vary from those in a training dataset.

A model that only succeeds on familiar examples has little practical value. The usefulness of AI systems depends on their ability to transfer what they learned from past data to future situations. IBM describes generalisation as the key reason machine learning can be used for prediction and classification tasks in everyday applications. When overfitting occurs, that practical value is undermined because performance becomes tied too closely to historical examples. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

This is why benchmark results are usually reported on held-out test data rather than training data. The test score provides a more realistic estimate of how the model is likely to behave outside the laboratory. [Real Python]realpython.comReal PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis…

Ways teams reduce overfitting

Overfitting cannot always be eliminated, but several widely used techniques help reduce it.

Use more representative data. Models tend to generalise better when training examples capture the diversity of real-world situations. Narrow or unrepresentative datasets make it easier for models to learn misleading shortcuts. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…

Limit unnecessary complexity. Extremely complex models can fit tiny details in the training data. In many cases, a simpler model performs better on unseen examples because it focuses on broader patterns rather than noise. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m…

Apply regularisation. Regularisation techniques deliberately discourage excessive complexity during training. They introduce penalties that push the model towards simpler explanations, improving its ability to generalise. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting: L2 regularization | Machine LearningDec 3, 2025 — Learn how the L2 regularization metric is calculated…

Use validation and cross-validation. Repeated evaluation on separate data helps detect whether performance gains are genuine or merely the result of memorising the training set. Cross-validation, which rotates different portions of data through training and testing roles, provides a more robust estimate of generalisation performance. [Scikit-learn]scikit-learn.orgUnderfitting vs. OverfittingWe evaluate quantitatively overfitting / underfitting by using cross-validation. We calculate the…

Stop training at the right time. If validation performance begins to worsen while training performance continues to improve, developers may halt training before the model becomes excessively specialised. Monitoring these curves is a standard defence against overfitting. [Google for Developers]developers.google.cominterpreting loss curvesEnsure that the training set and test set are statistically equivalent. The learning rate is too high. If the…Read more…

Overfitting illustration 3

The key lesson

Overfitting exposes a fundamental truth about artificial intelligence: success on past examples is not the same as understanding a task. A model can appear impressive when judged by the data it has already seen yet fail when confronted with genuinely new situations. Test sets, validation procedures, and other evaluation methods exist to answer the question that matters most: not “How well did the model remember?” but “How well does it perform on the next example?” [Google for Developers+2Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl…

Amazon book picks

Further Reading

Books and field guides related to When a model memorises instead of learning. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: ibm.com
    Link: https://www.ibm.com/think/topics/overfitting
    Source snippet

    What is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't...

  2. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/overfitting
    Source snippet

    Google for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting occurs when a model performs well on training data but poorl...

  3. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/model-complexity
    Source snippet

    Google for DevelopersOverfitting: Model complexity | Machine LearningDec 3, 2025 — The simple model generalizes better than the complex m...

  4. Source: ibm.com
    Link: https://www.ibm.com/think/topics/overfitting-vs-underfitting
    Source snippet

    to memorization instead of generalization.Read more...

  5. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/generalization
    Source snippet

    Google for DevelopersGeneralization | Machine LearningAug 25, 2025 — Learn about the machine learning concept of generalization: ensuring...

  6. Source: docs.cloud.google.com
    Link: https://docs.cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutions
    Source snippet

    Google Cloud DocumentationGuidelines for developing high-quality, predictive ML...Jul 8, 2024 — Compare your model performance on the tr...

  7. Source: scikit-learn.org
    Title: learning curve
    Link: https://scikit-learn.org/stable/modules/learning_curve.html
    Source snippet

    Validation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over...

  8. Source: unidata.pro
    Link: https://unidata.pro/blog/validation-dataset-in-ml/
    Source snippet

    Validation Dataset in Machine Learning - Unidata13 Sept 2024 — Validation set (~15%) — used during training to monitor performance...

  9. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/[production
    Source snippet

    ML systems: Monitoring pipelines16 Oct 2025 — You partition the data carefully, ensuring that your training set is well isolated from you...

  10. Source: developers.google.com
    Title: interpreting loss curves
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/interpreting-loss-curves
    Source snippet

    Ensure that the training set and test set are statistically equivalent. The learning rate is too high. If the...Read more...

  11. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/glossary/fundamentals
    Source snippet

    For example, the following generalization curve suggests overfitting because validation loss...Read more...

  12. Source: ibm.com
    Link: https://www.ibm.com/think/topics/model-selection
    Source snippet

    Model Selection in Machine LearningOverfitting means that the model adapts too closely to the training set and cannot generalize to new d...

  13. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/regularization
    Source snippet

    Google for DevelopersOverfitting: L2 regularization | Machine LearningDec 3, 2025 — Learn how the L2 regularization metric is calculated...

  14. Source: developers.google.com
    Title: Incorrect predictions. Learning rate. Complexity.Read more
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/quiz
    Source snippet

    Google for DevelopersDatasets, generalization, and overfitting: Test Your...Regularization improves your model's ability to generalize t...

  15. Source: scikit-learn.org
    Link: https://scikit-learn.org/0.18/auto_examples/model_selection/plot_underfitting_overfitting.html
    Source snippet

    Underfitting vs. OverfittingWe evaluate quantitatively overfitting / underfitting by using cross-validation. We calculate the...

  16. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting
    Source snippet

    google.comDatasets, generalization, and overfitting | Machine LearningDec 3, 2025 — In this module, you'll learn more about the character...

  17. Source: developers.google.com
    Title: overfitting and pruning
    Link: https://developers.google.com/machine-learning/decision-forests/overfitting-and-pruning
    Source snippet

    and pruningAug 25, 2025 — You can disable pruning with the validation dataset by setting validation_ratio=0.0. Those criteria introduce...

  18. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/fairness
    Source snippet

    Machine LearningAug 25, 2025 — This module looks at different types of human biases that can manifest in training data. It then provide...

  19. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/llm
    Source snippet

    to Large [Language Models]({{ 'language-models/' | relative_url }}) | Machine LearningJan 9, 2026 — This course module provides an overview of language models and large language mo...

  20. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/guides/text-classification/step-4
    Source snippet

    4: Build, Train, and Evaluate Your ModelAug 25, 2025 — In this section, we will work towards building, training and evaluating our model...

  21. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/automl
    Source snippet

    Machine Learning (AutoML)Aug 25, 2025 — This course module teaches best practices for using automated machine learning (AutoML) tools in...

  22. Source: developers.google.com
    Title: dividing datasets
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/dividing-datasets
    Source snippet

    the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into training, validation, and test s...

  23. Source: developers.google.com
    Title: imbalanced datasets
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/imbalanced-datasets
    Source snippet

    google.comClass-imbalanced datasets | Machine LearningAug 28, 2025 — Learn how to overcome problems with training imbalanced datasets by...

  24. Source: developers.google.com
    Title: linear regression
    Link: https://developers.google.com/machine-learning/crash-course/linear-regression
    Source snippet

    regression | Machine LearningDec 9, 2025 — This course module teaches the fundamentals of linear regression, including linear equations...

  25. Source: ibm.com
    Link: https://www.ibm.com/
    Source snippet

    For more than a century, IBM has been a global technology innovator, leading advances in AI, [automation]({{ 'automation-bias/' | relative_url }}) and hybrid cloud solutions tha...

  26. Source: scikit-learn.org
    Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
    Source snippet

    Underfitting vs. OverfittingThis example demonstrates the problems of underfitting and overfitting and how we can use linear regression w...

  27. Source: scikit-learn.org
    Link: https://scikit-learn.org/
    Source snippet

    machine learning in Python — scikit-learn 1.9.0...Machine Learning in Python · Simple and efficient tools for predictive d...

  28. Source: scikit-learn.org
    Title: Underfitting vs
    Link: https://scikit-learn.org/1.3/auto_examples/model_selection/plot_underfitting_overfitting.html
    Source snippet

    Overfitting — scikit-learn 1.3.2 documentationThis example demonstrates the problems of underfitting and overfitting and how we can use l...

  29. Source: scikit-learn.org
    Link: https://scikit-learn.org/0.15/auto_examples/plot_underfitting_overfitting.html
    Source snippet

    Underfitting vs. OverfittingThis example demonstrates the problems of underfitting and overfitting and how we can use linear regression w...

  30. Source: scikit-learn.org
    Link: https://scikit-learn.org/stable/auto_examples/index.html
    Source snippet

    Some examples demonstrate the use of the API in general and some demonstrate...Read more...

  31. Source: apxml.com
    Link: https://apxml.com/courses/getting-started-with-scikit-learn/chapter-5-model-selection-evaluation/overfitting-underfitting
    Source snippet

    It captures not only the underlying patterns but also the noise and random fluctuations specific to the...Read more...

  32. Source: realpython.com
    Link: https://realpython.com/train-test-split-python-data/
    Source snippet

    Real PythonSplit Your Dataset With scikit-learn's train_test_splitIn this tutorial, you'll learn why splitting your dataset in supervis...

  33. Source: aws.amazon.com
    Link: https://aws.amazon.com/what-is/overfitting/
    Source snippet

    in Machine Learning ExplainedOverfitting is an undesirable machine learning behavior that occurs when the machine learning model gives ac...

  34. Source: stackoverflow.com
    Link: https://stackoverflow.com/questions/20357705/scikit-learn-cross-validation-over-fitting-or-under-fitting

  35. Source: coursera.org
    Link: https://www.coursera.org/articles/overfitting
    Source snippet

    Data: A Beginner's Guide23 Oct 2025 — Overfitting is a type of machine learning behavior where the machine learning model is accurate for...

  36. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Overfitting
    Source snippet

    OverfittingIn mathematical modeling, overfitting is the production of an analysis that corresponds too closely or exactly to a particu...

  37. Source: github.com
    Link: https://github.com/cstjean/ScikitLearn.jl/blob/master/examples/Underfitting_vs_Overfitting.ipynb
    Source snippet

    ScikitLearn.jl/examples/Underfitting_vs_Overfitting.ipynb at...This example demonstrates the problems of underfitting and overfitting an...

  38. Source: scipy-lectures.org
    Link: https://scipy-lectures.org/packages/scikit-learn/index.html
    Source snippet

    scikit-learn: machine learning in PythonAgain, this is an example of fitting a model to data, but our focus here is that the model can ma...

Additional References

  1. Source: medium.com
    Link: https://medium.com/%40boutnaru/artificial-intelligence-overfitting-vs-underfitting-66d83d38b1a0
    Source snippet

    Artificial Intelligence — Overfitting vs UnderfittingOverfitting is a case in which a machine learning model learns the training data too...

  2. Source: inria.github.io
    Link: https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.html
    Source snippet

    Overfit-generalization-underfit — Scikit-learn courseIn this notebook, we put these two errors into perspective and show how they can hel...

  3. Source: kaggle.com
    Link: https://www.kaggle.com/code/dansbecker/underfitting-and-overfitting
    Source snippet

    Underfitting and OverfittingThis is a phenomenon called overfitting, where a model matches the training data almost perfectly, but does p...

  4. Source: inria.github.io
    Link: https://inria.github.io/scikit-learn-mooc/overfit/overfitting_vs_under_fitting_slides.html
    Source snippet

    🎥 Overfitting and Underfitting — Scikit-learn courseOverfitting and underfitting. Understand when and why a model does or does not genera...

  5. Source: medium.com
    Link: https://medium.com/%40kocyigit.emre/machine-learning-challenges-6-overfitting-a1d46869803f

  6. Source: ai.stackexchange.com
    Title: Consider a noisy 2d dataset where I am fitting polynomials. A good model would
    Link: https://ai.stackexchange.com/questions/43298/why-does-model-overfitting-lead-to-poor-generalization
    Source snippet

    does model overfitting lead to poor generalization?2 Jan 2024 — If a model overfit to the training data, why does it generalize poorly?...

  7. Source: youtube.com
    Link: https://www.youtube.com/watch?v=tJBfN577mhA
    Source snippet

    AI Concepts: Overfitting, Underfitting and GeneralisationOverfitting and underfitting are basically problems that prevent model from gene...

  8. Source: youtu.be
    Link: https://youtu.be/E3_408q1mjo
    Source snippet

    Gradient Boosting with Regression Trees Explained: [https://youtu.be/lOwsMpdjxog](https://youtu.be/lOwsMpdjxog) P-Values Explained: [https://youtu.be/IZUfbRvsZ9w](https://youtu.be/IZUfbRvsZ9w) Kabsch-U...

  9. Source: biztechmagazine.com
    Title: what overfitting machine learning and how can it be prevented perfcon
    Link: https://biztechmagazine.com/article/2022/03/what-overfitting-machine-learning-and-how-can-it-be-prevented-perfcon
    Source snippet

    Overfitting in Machine Learning: What is it & How Can It Be...16 Mar 2022 — Overfitting refers to the use of a data set that is too clos...

  10. Source: youtu.be
    Link: https://youtu.be/IZUfbRvsZ9w
    Source snippet

    Kabsch-Umeyama Algorithm: [https://youtu.be/nCs_e6fP7Jo](https://youtu.be/nCs_e6fP7Jo) Eigendecomposition Explained: [https://youtu.be/ihUr2LbdYlE](https://youtu.be/ihUr2LbdYlE) Covariance Matrix Expla...

Topic Tree

Follow this branch

Parent topic

Machine Learning How Machines Learn From Examples

Related pages 4

More on this topic 3