Within Overfitting

The new examples that reveal overfitting

A held-out test set gives a clearer check of whether a model can handle examples it did not see during training.

On this page

  • What a test set is for
  • Training scores versus test scores
  • Why untouched data matters
Preview for The new examples that reveal overfitting

Introduction

A test set is one of the simplest and most powerful tools for detecting overfitting in artificial intelligence. An overfitted model can appear highly successful when judged on the data it learned from, yet fail when faced with genuinely new examples. The purpose of a test set is to reveal this gap. By holding back a portion of data and keeping it unseen during training, developers can check whether a model has learned a general pattern or merely memorised the training examples. When training results are excellent but test results are noticeably worse, the test set provides direct evidence that overfitting has occurred. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Test sets illustration 1

What a test set is for

A test set is a collection of examples deliberately excluded from the learning process. The model does not use these examples to adjust its parameters, choose settings, or improve its performance. Instead, the test set serves as an independent examination that is taken only after development is complete. [Google for Developers]developers.google.comdividing datasetsGoogle for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr…

The logic resembles a final exam. A student who has only practised a fixed set of questions may appear knowledgeable, but a new examination reveals whether the underlying subject has actually been understood. In machine learning, the test set plays the role of that new examination. It measures how well the model handles situations it has not previously encountered. [Google for Developers]developers.google.comGoogle for DevelopersDatasets, generalization, and overfitting | Machine LearningThis course module provides guidelines for preparing dat…

This matters because the real purpose of AI systems is not to reproduce old answers. Their value comes from making useful predictions about future cases, new customers, unseen images, unfamiliar documents, or other data that did not exist during training. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Training scores versus test scores

The clearest sign of overfitting appears when training and test performance diverge.

Consider a model that achieves 99% accuracy on its training data. At first glance, that seems impressive. However, if the same model achieves only 75% accuracy on the test set, the difference suggests that much of what it learned does not transfer to new situations. Rather than discovering general rules, it has adapted itself too closely to the specific details of the training examples. [Amazon Web Services, Inc.]aws.amazon.comWeb Services, Inc.What is Overfitting?Amazon Web Services, Inc.What is Overfitting? - Overfitting in Machine Learning Explained…

Researchers often look for a pattern known as the generalisation gap: the difference between performance on training data and performance on unseen data. A small gap suggests the model has learned useful patterns that carry over to new examples. A large gap suggests overfitting. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…

Learning curves and validation curves make this visible. A model that continues improving on the training set while stagnating or worsening on unseen data is showing classic overfitting behaviour. Scikit-learn’s documentation identifies the combination of high training scores and low validation scores as a hallmark of overfitting. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…

A concrete example

Imagine an image classifier trained to identify cats and dogs.

If many training photographs of dogs happen to be taken outdoors while cat photographs are mostly indoors, an overly flexible model may learn to associate grass with dogs and furniture with cats. During training, this shortcut may produce excellent scores because those background patterns are common. However, when the test set contains indoor dogs and outdoor cats, performance drops sharply.

The poor test result exposes the fact that the model learned an accidental correlation rather than the concept it was supposed to learn. The training score alone would not reveal this mistake. [Google for Developers]developers.google.comGoogle for DevelopersDatasets, generalization, and overfitting | Machine LearningThis course module provides guidelines for preparing dat…

Test sets illustration 2

Why untouched data matters

A test set only works if it remains genuinely untouched.

If developers repeatedly examine test results and alter the model in response, information from the test set gradually leaks into the development process. The test set stops functioning as an independent check and becomes another source of training feedback. This phenomenon is often described as “overfitting the test set”. [Reddit]reddit.comReddit[D] Does most research in ML overfit to the test set in some…The rule is that you should first divide the whole dataset into tra…

For this reason, machine-learning practice usually separates data into three groups:

  • Training set: used to learn patterns.
  • Validation set: used during development to compare model versions and tune settings.
  • Test set: reserved for final evaluation only. [GeeksforGeeks]geeksforgeeks.orgtraining vs testing vs validation setsTraining vs Testing vs Validation SetsJan 6, 2026 — The training set teaches the model patterns, the validation set helps fi…

Keeping the test set isolated preserves its value as evidence. Once the model has indirectly learned from the test data, the reported score no longer reflects performance on truly unseen examples. [Reddit]reddit.comReddit[D] Does most research in ML overfit to the test set in some…The rule is that you should first divide the whole dataset into tra…

Google’s machine-learning guidance also emphasises that duplicates between training and test partitions can create misleadingly optimistic results. If a test example is effectively the same as one used during training, the model is not being challenged with something genuinely new. Removing such overlaps helps ensure a fair evaluation. [Google for Developers]developers.google.comdividing datasetsGoogle for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr…

When test results can still be misleading

Although test sets are powerful, they are not perfect.

A test set can only reveal overfitting if it accurately represents the kind of data the model will encounter in practice. If the test set is too small, unrepresentative, or contains systematic errors, its results may give a distorted picture of real-world performance. [Google for Developers]developers.google.comdividing datasetsGoogle for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr…

Researchers have found significant labelling errors in several widely used benchmark test sets. In such cases, measured accuracy can reflect imperfections in the benchmark rather than genuine model quality. This does not make test sets useless, but it highlights that the quality of the test data matters as much as the quality of the model. [arXiv]arxiv.orgOpen source on arxiv.org.

Another limitation is distribution shift. A model may perform well on a carefully prepared test set yet struggle when deployed in a changing environment where future data differs from past data. A test set measures performance on unseen examples from the available dataset; it cannot perfectly predict every future condition. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting means creating a model that matches (memorizes) the training…

Test sets illustration 3

What test sets reveal about intelligence

The most important lesson from test sets is that intelligence in machine learning is judged by behaviour on new examples, not by success on familiar ones.

An overfitted model can appear almost flawless when evaluated on the data it has already seen. The test set strips away that illusion. By confronting the model with untouched examples, it provides evidence about whether the system has learned a transferable pattern or merely memorised the past. A strong test score does not guarantee perfect real-world performance, but a large gap between training and test results is one of the clearest signs that a model has failed to generalise. IBM+2Amazon Web Services, Inc. [ibm.com]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Amazon book picks

Further Reading

Books and field guides related to The new examples that reveal overfitting. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: ibm.com
    Link: https://www.ibm.com/think/topics/overfitting
    Source snippet

    What is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't...

  2. Source: aws.amazon.com
    Title: Web Services, Inc.What is Overfitting?
    Link: https://aws.amazon.com/what-is/overfitting/
    Source snippet

    Amazon Web Services, Inc.What is Overfitting? - Overfitting in Machine Learning Explained...

  3. Source: developers.google.com
    Title: dividing datasets
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/dividing-datasets
    Source snippet

    Google for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr...

  4. Source: geeksforgeeks.org
    Title: training vs testing vs validation sets
    Link: https://www.geeksforgeeks.org/machine-learning/training-vs-testing-vs-validation-sets/
    Source snippet

    Training vs Testing vs Validation SetsJan 6, 2026 — The training set teaches the model patterns, the validation set helps fi...

  5. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting
    Source snippet

    Google for DevelopersDatasets, generalization, and overfitting | Machine LearningThis course module provides guidelines for preparing dat...

  6. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/overfitting
    Source snippet

    Google for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting means creating a model that matches (memorizes) the training...

  7. Source: scikit-learn.org
    Title: learning curve
    Link: https://scikit-learn.org/stable/modules/learning_curve.html
    Source snippet

    Validation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over...

  8. Source: geeksforgeeks.org
    Title: validation curve using scikit learn
    Link: https://www.geeksforgeeks.org/machine-learning/validation-curve-using-scikit-learn/
    Source snippet

    Validation Curve using Scikit-learnJul 23, 2025 — Overfitting: If the training score is high and the validation score is low, the model i...

  9. Source: reddit.com
    Link: https://www.reddit.com/r/MachineLearning/comments/81o4f0/d_does_most_research_in_ml_overfit_to_the_test/
    Source snippet

    Reddit[D] Does most research in ML overfit to the test set in some...The rule is that you should first divide the whole dataset into tra...

  10. Source: arxiv.org
    Title: arXiv Model Similarity Mitigates Test Set Overuse
    Link: https://arxiv.org/abs/1905.12580

  11. Source: arxiv.org
    Link: https://arxiv.org/abs/2103.14749

  12. Source: scikit-learn.org
    Link: https://scikit-learn.org/stable/modules/cross_validation.html
    Source snippet

    3.1. Cross-validation: evaluating estimator performanceCross-validation provides information about how well an estimator generalizes by e...

  13. Source: google.com
    Link: https://www.google.com/
    Source snippet

    Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exac...

  14. Source: developers.google.com
    Title: test your knowledge
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/test-your-knowledge
    Source snippet

    your knowledge | Machine LearningDec 3, 2025 — Test your knowledge of dataset, generalization, and overfitting principles by completing t...

  15. Source: developers.google.com
    Link: https://developers.google.com/machine-learning/crash-course/overfitting/quiz
    Source snippet

    You must answer at least 4 questions correctly to pass this quiz. Which of the following is an example of a stationary dataset?Read more...

  16. Source: developers.google.com
    Title: crash course
    Link: https://developers.google.com/machine-learning/crash-course
    Source snippet

    google.comGoogle's Machine Learning Crash CourseDatasets, Generalization, and Overfitting. An introduction to the characteristics of mach...

  17. Source: scikit-learn.org
    Link: https://scikit-learn.org/
    Source snippet

    machine learning in Python — scikit-learn 1.9.0...Machine Learning in Python · Simple and efficient tools for predictive d...

  18. Source: scikit-learn.org
    Title: learning curve
    Link: https://scikit-learn.org/0.17/modules/learning_curve.html
    Source snippet

    Validation curves: plotting scores to evaluate modelsA learning curve shows the validation and training score of an estimator for varying...

  19. Source: scikit-learn.org
    Title: learning curve
    Link: https://scikit-learn.org/0.16/modules/learning_curve.html
    Source snippet

    Validation curves: plotting scores to evaluate modelsA learning curve shows the validation and training score of an estimator for varying...

  20. Source: scikit-learn.org
    Title: learning curve
    Link: https://scikit-learn.org/0.15/modules/learning_curve.html
    Source snippet

    Validation curves: plotting scores to evaluate modelsA learning curve shows the validation and training score of an estimator for varying...

  21. Source: scikit-learn.org
    Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
    Source snippet

    The effect is depicted by [checking]({{ 'checklists/' | relative_url }}) the statistical performance of the model...

  22. Source: scikit-learn.org
    Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
    Source snippet

    Underfitting vs. OverfittingWe calculate the mean squared error (MSE) on the validation set, the higher, the less likely the model genera...

  23. Source: scikit-learn.org
    Link: https://scikit-learn.org/1.5/auto_examples/model_selection/plot_validation_curve.html
    Source snippet

    Plotting Validation CurvesIf gamma is too high, the classifier will overfit, which means that the training score is good but the validati...

  24. Source: about.google
    Link: https://about.google/
    Source snippet

    Our products, technology and company...Learn more about Google. Explore our innovative AI products and services, and how we're using tec...

  25. Source: arxiv.org
    Link: https://arxiv.org/html/2305.05792v2
    Source snippet

    Testing for OverfittingMar 10, 2025 — We stipulate conditions under which this test is valid, explain why training data alone is unsuitab...

  26. Source: stackoverflow.com
    Link: https://stackoverflow.com/questions/20357705/scikit-learn-cross-validation-over-fitting-or-under-fitting

Additional References

  1. Source: inria.github.io
    Link: https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.html
    Source snippet

    Overfit-generalization-underfit — Scikit-learn courseIn this notebook, we put these two errors into perspective and show how they can hel...

  2. Source: stackoverflow.com
    Link: https://stackoverflow.com/questions/52717219/machine-learning-python-drawing-validation-curve
    Source snippet

    "Machine Learning + Python: Drawing Validation curveI want to draw a validation curve for my Naive Bayes estimator like this: [http://scik..."](http://scik...")...

  3. Source: scikit-yb.org
    Link: https://www.scikit-yb.org/en/latest/api/model_selection/validation_curve.html
    Source snippet

    Validation Curve — Yellowbrick v1.5 documentationAfter a depth of 7, the training and test scores diverge, this is because deeper trees a...

  4. Source: github.com
    Link: https://github.com/litaotao/machine-learning-crash-course
    Source snippet

    machine-learning-crash-course from googleThe model's generalization curve above means that the model is overfitting to the data in the tr...

  5. Source: ai.stackexchange.com
    Title: Consider a noisy 2d dataset where I am fitting polynomials. A good model would
    Link: https://ai.stackexchange.com/questions/43298/why-does-model-overfitting-lead-to-poor-generalization
    Source snippet

    does model overfitting lead to poor generalization?Jan 2, 2024 — If a model overfit to the training data, why does it generalize poorly?...

  6. Source: youtube.com
    Link: https://www.youtube.com/watch?v=rEwiRBdz1us
    Source snippet

    #127: Scikit-learn 121: Model Selection 9: Validation curvesThe video discusses validation and learning curves in Scikit-learn... Lectur...

  7. Source: youtube.com
    Link: https://www.youtube.com/watch?v=TyT8i8YIcwI&vl=en
    Source snippet

    Machine Learning Crash Course: GeneralizationThe quality of a machine learning model hinges on its ability to generalize: to make good pr...

  8. Source: refontelearning.com
    Title: Overfitting happens when a model learns the training data too well –
    Link: https://www.refontelearning.com/blog/model-evaluation-and-validation-techniques-in-machine-learning
    Source snippet

    Model Evaluation and Validation Techniques in Machine...Aug 22, 2025 — Two common issues can hurt generalization: overfitting and underf...

  9. Source: machinelearningmastery.com
    Title: learning curves for diagnosing machine learning model performance
    Link: https://www.machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
    Source snippet

    How to use Learning Curves to Diagnose Machine...Aug 6, 2019 — In this post, you will discover learning curves and how they can be used...

  10. Source: jobaajlearnings.com
    Title: overfitting in machine learning what it is and how to prevent it
    Link: https://www.jobaajlearnings.com/blog/overfitting-in-machine-learning-what-it-is-and-how-to-prevent-it
    Source snippet

    Overfitting in Machine Learning: What It Is and How to...30 Mar 2026 — In machine learning, overfitting occurs when a model learns the t...

Topic Tree

Follow this branch

Parent topic

Overfitting When a model memorises instead of learning

Related pages 2