The new examples that reveal overfitting

Introduction

A test set is one of the simplest and most powerful tools for detecting overfitting in artificial intelligence. An overfitted model can appear highly successful when judged on the data it learned from, yet fail when faced with genuinely new examples. The purpose of a test set is to reveal this gap. By holding back a portion of data and keeping it unseen during training, developers can check whether a model has learned a general pattern or merely memorised the training examples. When training results are excellent but test results are noticeably worse, the test set provides direct evidence that overfitting has occurred. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Test sets illustration 1

What a test set is for

A test set is a collection of examples deliberately excluded from the learning process. The model does not use these examples to adjust its parameters, choose settings, or improve its performance. Instead, the test set serves as an independent examination that is taken only after development is complete. [Google for Developers]developers.google.comdividing datasetsGoogle for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr…

The logic resembles a final exam. A student who has only practised a fixed set of questions may appear knowledgeable, but a new examination reveals whether the underlying subject has actually been understood. In machine learning, the test set plays the role of that new examination. It measures how well the model handles situations it has not previously encountered. [Google for Developers]developers.google.comGoogle for DevelopersDatasets, generalization, and overfitting | Machine LearningThis course module provides guidelines for preparing dat…

This matters because the real purpose of AI systems is not to reproduce old answers. Their value comes from making useful predictions about future cases, new customers, unseen images, unfamiliar documents, or other data that did not exist during training. [IBM]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Training scores versus test scores

The clearest sign of overfitting appears when training and test performance diverge.

Consider a model that achieves 99% accuracy on its training data. At first glance, that seems impressive. However, if the same model achieves only 75% accuracy on the test set, the difference suggests that much of what it learned does not transfer to new situations. Rather than discovering general rules, it has adapted itself too closely to the specific details of the training examples. [Amazon Web Services, Inc.]aws.amazon.comWeb Services, Inc.What is Overfitting?Amazon Web Services, Inc.What is Overfitting? - Overfitting in Machine Learning Explained…

Researchers often look for a pattern known as the generalisation gap: the difference between performance on training data and performance on unseen data. A small gap suggests the model has learned useful patterns that carry over to new examples. A large gap suggests overfitting. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…

Learning curves and validation curves make this visible. A model that continues improving on the training set while stagnating or worsening on unseen data is showing classic overfitting behaviour. Scikit-learn’s documentation identifies the combination of high training scores and low validation scores as a hallmark of overfitting. [Scikit-learn]scikit-learn.orglearning curveValidation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over…

A concrete example

Imagine an image classifier trained to identify cats and dogs.

If many training photographs of dogs happen to be taken outdoors while cat photographs are mostly indoors, an overly flexible model may learn to associate grass with dogs and furniture with cats. During training, this shortcut may produce excellent scores because those background patterns are common. However, when the test set contains indoor dogs and outdoor cats, performance drops sharply.

The poor test result exposes the fact that the model learned an accidental correlation rather than the concept it was supposed to learn. The training score alone would not reveal this mistake. [Google for Developers]developers.google.comGoogle for DevelopersDatasets, generalization, and overfitting | Machine LearningThis course module provides guidelines for preparing dat…

Test sets illustration 2

Why untouched data matters

A test set only works if it remains genuinely untouched.

If developers repeatedly examine test results and alter the model in response, information from the test set gradually leaks into the development process. The test set stops functioning as an independent check and becomes another source of training feedback. This phenomenon is often described as “overfitting the test set”. [Reddit]reddit.comReddit[D] Does most research in ML overfit to the test set in some…The rule is that you should first divide the whole dataset into tra…

For this reason, machine-learning practice usually separates data into three groups:

Training set: used to learn patterns.
Validation set: used during development to compare model versions and tune settings.
Test set: reserved for final evaluation only. [GeeksforGeeks]geeksforgeeks.orgtraining vs testing vs validation setsTraining vs Testing vs Validation SetsJan 6, 2026 — The training set teaches the model patterns, the validation set helps fi…

Keeping the test set isolated preserves its value as evidence. Once the model has indirectly learned from the test data, the reported score no longer reflects performance on truly unseen examples. [Reddit]reddit.comReddit[D] Does most research in ML overfit to the test set in some…The rule is that you should first divide the whole dataset into tra…

Google’s machine-learning guidance also emphasises that duplicates between training and test partitions can create misleadingly optimistic results. If a test example is effectively the same as one used during training, the model is not being challenged with something genuinely new. Removing such overlaps helps ensure a fair evaluation. [Google for Developers]developers.google.comdividing datasetsGoogle for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr…

When test results can still be misleading

Although test sets are powerful, they are not perfect.

A test set can only reveal overfitting if it accurately represents the kind of data the model will encounter in practice. If the test set is too small, unrepresentative, or contains systematic errors, its results may give a distorted picture of real-world performance. [Google for Developers]developers.google.comdividing datasetsGoogle for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr…

Researchers have found significant labelling errors in several widely used benchmark test sets. In such cases, measured accuracy can reflect imperfections in the benchmark rather than genuine model quality. This does not make test sets useless, but it highlights that the quality of the test data matters as much as the quality of the model. [arXiv]arxiv.orgOpen source on arxiv.org.

Another limitation is distribution shift. A model may perform well on a carefully prepared test set yet struggle when deployed in a changing environment where future data differs from past data. A test set measures performance on unseen examples from the available dataset; it cannot perfectly predict every future condition. [Google for Developers]developers.google.comGoogle for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting means creating a model that matches (memorizes) the training…

Test sets illustration 3

What test sets reveal about intelligence

The most important lesson from test sets is that intelligence in machine learning is judged by behaviour on new examples, not by success on familiar ones.

An overfitted model can appear almost flawless when evaluated on the data it has already seen. The test set strips away that illusion. By confronting the model with untouched examples, it provides evidence about whether the system has learned a transferable pattern or merely memorised the past. A strong test score does not guarantee perfect real-world performance, but a large gap between training and test results is one of the clearest signs that a model has failed to generalise. IBM+2Amazon Web Services, Inc. [ibm.com]ibm.comWhat is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't…

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

Machine Learning Revolut Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: machine learning poster

Browse similar on eBay.co.uk

Example eBay listing

Palace Learning 4 Pack - Cable Machine Workout Posters Volume 1 & Volume 2 + Dum

Search eBay.co.uk: machine learning poster

Browse similar on eBay.co.uk

Example eBay listing

Palace Learning 3 Pack - Cable Machine Workout Posters 18" x 24", LAMINATED

Search eBay.co.uk: machine learning poster

Browse similar on eBay.co.uk

Example eBay listing

Something about Machine Learning or Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: machine learning poster

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: ibm.com
Link: https://www.ibm.com/think/topics/overfitting
Source snippet
What is Overfitting? | IBMOverfitting occurs when an algorithm fits too closely to its training data, resulting in a model that can't...
Source: aws.amazon.com
Title: Web Services, Inc.What is Overfitting?
Link: https://aws.amazon.com/what-is/overfitting/
Source snippet
Amazon Web Services, Inc.What is Overfitting? - Overfitting in Machine Learning Explained...
Source: developers.google.com
Title: dividing datasets
Link: https://developers.google.com/machine-learning/crash-course/overfitting/dividing-datasets
Source snippet
Google for DevelopersDividing the original dataset | Machine LearningDec 3, 2025 — Learn how to divide a machine learning dataset into tr...
Source: geeksforgeeks.org
Title: training vs testing vs validation sets
Link: https://www.geeksforgeeks.org/machine-learning/training-vs-testing-vs-validation-sets/
Source snippet
Training vs Testing vs Validation SetsJan 6, 2026 — The training set teaches the model patterns, the validation set helps fi...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting
Source snippet
Google for DevelopersDatasets, generalization, and overfitting | Machine LearningThis course module provides guidelines for preparing dat...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/overfitting
Source snippet
Google for DevelopersOverfitting | Machine LearningDec 3, 2025 — Overfitting means creating a model that matches (memorizes) the training...
Source: scikit-learn.org
Title: learning curve
Link: https://scikit-learn.org/stable/modules/learning_curve.html
Source snippet
Validation curves: plotting scores to evaluate modelsIf the training score is high and the validation score is low, the estimator is over...
Source: geeksforgeeks.org
Title: validation curve using scikit learn
Link: https://www.geeksforgeeks.org/machine-learning/validation-curve-using-scikit-learn/
Source snippet
Validation Curve using Scikit-learnJul 23, 2025 — Overfitting: If the training score is high and the validation score is low, the model i...
Source: reddit.com
Link: https://www.reddit.com/r/MachineLearning/comments/81o4f0/d_does_most_research_in_ml_overfit_to_the_test/
Source snippet
Reddit[D] Does most research in ML overfit to the test set in some...The rule is that you should first divide the whole dataset into tra...
Source: arxiv.org
Title: arXiv Model Similarity Mitigates Test Set Overuse
Link: https://arxiv.org/abs/1905.12580
Source: arxiv.org
Link: https://arxiv.org/abs/2103.14749
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/modules/cross_validation.html
Source snippet
3.1. Cross-validation: evaluating estimator performanceCross-validation provides information about how well an estimator generalizes by e...
Source: google.com
Link: https://www.google.com/
Source snippet
Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exac...
Source: developers.google.com
Title: test your knowledge
Link: https://developers.google.com/machine-learning/crash-course/overfitting/test-your-knowledge
Source snippet
your knowledge | Machine LearningDec 3, 2025 — Test your knowledge of dataset, generalization, and overfitting principles by completing t...
Source: developers.google.com
Link: https://developers.google.com/machine-learning/crash-course/overfitting/quiz
Source snippet
You must answer at least 4 questions correctly to pass this quiz. Which of the following is an example of a stationary dataset?Read more...
Source: developers.google.com
Title: crash course
Link: https://developers.google.com/machine-learning/crash-course
Source snippet
google.comGoogle's Machine Learning Crash CourseDatasets, Generalization, and Overfitting. An introduction to the characteristics of mach...
Source: scikit-learn.org
Link: https://scikit-learn.org/
Source snippet
machine learning in Python — scikit-learn 1.9.0...Machine Learning in Python · Simple and efficient tools for predictive d...
Source: scikit-learn.org
Title: learning curve
Link: https://scikit-learn.org/0.17/modules/learning_curve.html
Source snippet
Validation curves: plotting scores to evaluate modelsA learning curve shows the validation and training score of an estimator for varying...
Source: scikit-learn.org
Title: learning curve
Link: https://scikit-learn.org/0.16/modules/learning_curve.html
Source snippet
Validation curves: plotting scores to evaluate modelsA learning curve shows the validation and training score of an estimator for varying...
Source: scikit-learn.org
Title: learning curve
Link: https://scikit-learn.org/0.15/modules/learning_curve.html
Source snippet
Validation curves: plotting scores to evaluate modelsA learning curve shows the validation and training score of an estimator for varying...
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
Source snippet
The effect is depicted by [checking]({{ 'checklists/' | relative_url }}) the statistical performance of the model...
Source: scikit-learn.org
Link: https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
Source snippet
Underfitting vs. OverfittingWe calculate the mean squared error (MSE) on the validation set, the higher, the less likely the model genera...
Source: scikit-learn.org
Link: https://scikit-learn.org/1.5/auto_examples/model_selection/plot_validation_curve.html
Source snippet
Plotting Validation CurvesIf gamma is too high, the classifier will overfit, which means that the training score is good but the validati...
Source: about.google
Link: https://about.google/
Source snippet
Our products, technology and company...Learn more about Google. Explore our innovative AI products and services, and how we're using tec...
Source: arxiv.org
Link: https://arxiv.org/html/2305.05792v2
Source snippet
Testing for OverfittingMar 10, 2025 — We stipulate conditions under which this test is valid, explain why training data alone is unsuitab...
Source: stackoverflow.com
Link: https://stackoverflow.com/questions/20357705/scikit-learn-cross-validation-over-fitting-or-under-fitting

Additional References

Source: inria.github.io
Link: https://inria.github.io/scikit-learn-mooc/python_scripts/cross_validation_validation_curve.html
Source snippet
Overfit-generalization-underfit — Scikit-learn courseIn this notebook, we put these two errors into perspective and show how they can hel...
Source: stackoverflow.com
Link: https://stackoverflow.com/questions/52717219/machine-learning-python-drawing-validation-curve
Source snippet
"Machine Learning + Python: Drawing Validation curveI want to draw a validation curve for my Naive Bayes estimator like this: [http://scik..."](http://scik...")...
Source: scikit-yb.org
Link: https://www.scikit-yb.org/en/latest/api/model_selection/validation_curve.html
Source snippet
Validation Curve — Yellowbrick v1.5 documentationAfter a depth of 7, the training and test scores diverge, this is because deeper trees a...
Source: github.com
Link: https://github.com/litaotao/machine-learning-crash-course
Source snippet
machine-learning-crash-course from googleThe model's generalization curve above means that the model is overfitting to the data in the tr...
Source: ai.stackexchange.com
Title: Consider a noisy 2d dataset where I am fitting polynomials. A good model would
Link: https://ai.stackexchange.com/questions/43298/why-does-model-overfitting-lead-to-poor-generalization
Source snippet
does model overfitting lead to poor generalization?Jan 2, 2024 — If a model overfit to the training data, why does it generalize poorly?...
Source: youtube.com
Link: https://www.youtube.com/watch?v=rEwiRBdz1us
Source snippet
#127: Scikit-learn 121: Model Selection 9: Validation curvesThe video discusses validation and learning curves in Scikit-learn... Lectur...
Source: youtube.com
Link: https://www.youtube.com/watch?v=TyT8i8YIcwI&vl=en
Source snippet
Machine Learning Crash Course: GeneralizationThe quality of a machine learning model hinges on its ability to generalize: to make good pr...
Source: refontelearning.com
Title: Overfitting happens when a model learns the training data too well –
Link: https://www.refontelearning.com/blog/model-evaluation-and-validation-techniques-in-machine-learning
Source snippet
Model Evaluation and Validation Techniques in Machine...Aug 22, 2025 — Two common issues can hurt generalization: overfitting and underf...
Source: machinelearningmastery.com
Title: learning curves for diagnosing machine learning model performance
Link: https://www.machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
Source snippet
How to use Learning Curves to Diagnose Machine...Aug 6, 2019 — In this post, you will discover learning curves and how they can be used...
Source: jobaajlearnings.com
Title: overfitting in machine learning what it is and how to prevent it
Link: https://www.jobaajlearnings.com/blog/overfitting-in-machine-learning-what-it-is-and-how-to-prevent-it
Source snippet
Overfitting in Machine Learning: What It Is and How to...30 Mar 2026 — In machine learning, overfitting occurs when a model learns the t...

The new examples that reveal overfitting

Introduction

What a test set is for

Training scores versus test scores

A concrete example

Why untouched data matters

When test results can still be misleading

What test sets reveal about intelligence

Further Reading

Hands-on Machine Learning with Scikit-Learn, Keras, and Tenso...

An Introduction to Statistical Learning

Pattern Recognition and Machine Learning

The Hundred-page Machine Learning Book

Marketplace Samples

Machine Learning Revolut Framed Wall Art Poster Canvas Print Picture

Palace Learning 4 Pack - Cable Machine Workout Posters Volume 1 & Volume 2 + Dum

Palace Learning 3 Pack - Cable Machine Workout Posters 18" x 24", LAMINATED

Something about Machine Learning or Framed Wall Art Poster Canvas Print Picture

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2