Within Training Choices
What AI labels should tell US before launch
Datasheets and model cards make training choices, intended uses, and performance limits easier to inspect before an AI system is trusted.
On this page
- What datasheets disclose about datasets
- What model cards disclose about trained systems
- Where documentation helps and where it cannot guarantee safety
Page outline Jump by section
Introduction
Before an AI system is deployed, one of the most important questions is not how powerful the model is, but whether anyone can clearly explain where its data came from, how it was trained, what it was tested on, and where it is likely to fail. Dataset datasheets and model cards were developed to answer those questions. They are documentation standards designed to make AI systems more transparent, helping organisations decide whether a model is suitable for a particular use before it affects real people. Rather than treating AI as a black box, these documents expose key design choices, assumptions, and limitations that might otherwise remain hidden. [arXiv]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
For governance and risk management, this matters because many deployment failures arise not from a lack of technical sophistication but from a mismatch between how a system was built and how it is actually used. Documentation cannot guarantee that an AI system is safe or fair, but it can make important risks visible before deployment decisions are made. [NIST]nist.govAI Risk Management FrameworkNIST has developed a framework to better manage risks to individuals, organizations, and society associat…
What AI labels should tell us before launch
The idea behind datasheets and model cards is similar to the labels that accompany medicines, electrical equipment, or industrial components. Users need information about intended use, testing conditions, limitations, and known risks before deciding whether a system can be trusted in a particular setting.
The proposal for “Datasheets for Datasets” emerged from concerns that machine-learning datasets were often shared without adequate information about their origins, collection methods, composition, or intended uses. The authors argued that every dataset should be accompanied by structured documentation covering why it was created, how it was assembled, what populations it contains, and where it should or should not be used. [arXiv+2Microsoft]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
A related proposal, “Model Cards for Model Reporting”, focused on trained models rather than datasets. Model cards were designed to describe a model’s intended uses, evaluation procedures, performance characteristics, limitations, and known risks. The goal was to help decision-makers understand not merely that a model works, but under what conditions it works and for whom. [arXiv+2ACM Digital Library]arxiv.orgarXiv Model Cards for Model ReportingarXiv Model Cards for Model Reporting
What datasheets disclose about datasets
Datasets are often treated as raw inputs, yet they encode many of the assumptions that later shape model behaviour. A datasheet aims to expose those assumptions before deployment.
Where the data came from
A useful datasheet explains the motivation for creating a dataset, who collected it, how the information was obtained, and whether individuals consented to its use where relevant. It should also identify funding sources, collection methods, and any significant preprocessing or filtering steps. [arXiv+2mlr3 Fairness]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
This information matters because two datasets may appear similar while reflecting very different populations or collection practices. Without documentation, organisations may unknowingly deploy models trained on data that does not resemble their target environment.
Who and what the dataset represents
Datasheets typically describe dataset composition, including the types of examples included, the size of the dataset, and any known gaps or imbalances. This helps reviewers determine whether important groups, locations, languages, or scenarios are under-represented. [Microsoft+2MDSD4Health]microsoft.comDatasheets for Datasetsby T Gebru · Cited by 4580 — The questions are divided into seven categories: motivation for dataset crea…
For example, a model intended for global deployment may have been trained primarily on data from a small number of countries. A deployment team that sees this information before launch can investigate whether additional testing or retraining is necessary.
Recommended and unsuitable uses
A dataset may be appropriate for one task and inappropriate for another. Datasheets encourage creators to document recommended uses and known limitations. This helps prevent “dataset drift”, where information collected for one purpose is later reused in contexts for which it was never designed. [arXiv+2Overleaf]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
From a governance perspective, this is valuable because it shifts attention from raw accuracy claims to questions of fitness for purpose.
What model cards disclose about trained systems
While datasheets focus on training data, model cards focus on the behaviour of the trained system itself.
Intended use and deployment boundaries
A model card should explain what the model was designed to do and, equally importantly, what it was not designed to do. Intended-use statements help organisations avoid deploying systems in settings that differ substantially from the conditions under which they were developed and tested. [arXiv+2IAPP.org]arxiv.orgarXiv Model Cards for Model ReportingarXiv Model Cards for Model Reporting
This may seem straightforward, but many deployment problems stem from using a model outside its validated scope. A model built to assist human reviewers, for example, may be unsuitable as a fully automated decision-maker.
How performance was measured
Aggregate accuracy figures can hide important weaknesses. Model cards therefore encourage disclosure of evaluation procedures, benchmark datasets, testing environments, and performance metrics. They also promote reporting across different demographic and contextual groups rather than relying on a single headline score. [arXiv+2ResearchGate]arxiv.orgarXiv Model Cards for Model ReportingarXiv Model Cards for Model Reporting
This information allows deployment teams to ask practical questions: Was the model tested on populations similar to ours? Were edge cases evaluated? Were error rates consistent across groups?
Known limitations and failure modes
A well-designed model card documents situations in which performance degrades or uncertainty increases. This may include limitations related to language coverage, environmental conditions, demographic variation, data quality, or adversarial inputs. [Alan Turing Institute+2Practical AI Act]alan-turing-institute.github.ioAlan Turing Institute Model CardsAlan Turing InstituteModel Cards - TEA TechniquesModel cards are standardised documentation frameworks that systematically document machi…
For governance purposes, these disclosures help organisations design safeguards, monitoring procedures, and human oversight mechanisms before deployment rather than after a failure occurs.
Why documentation changes deployment decisions
Documentation is often viewed as an administrative exercise, but its real value lies in supporting decision-making.
Before deployment, reviewers typically need to answer questions such as:
- Does the training data resemble the environment where the system will be used?
- Were relevant populations represented during development and testing?
- Has the model been evaluated under realistic operating conditions?
- What kinds of errors are expected?
- Are there contexts in which deployment should be restricted or prohibited?
Datasheets and model cards provide structured evidence that helps answer these questions. They transform deployment reviews from informal trust in a developer’s claims into a more auditable process based on documented information. [arXiv+2arXiv]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
Their importance has grown alongside broader AI governance frameworks. The NIST AI Risk Management Framework, for example, emphasises documentation, transparency, measurement, and traceability as part of responsible AI risk management. Documentation helps organisations map risks, evaluate evidence, and justify deployment decisions in a systematic way. [NIST+2ETO AGORA]nist.govAI Risk Management FrameworkNIST has developed a framework to better manage risks to individuals, organizations, and society associat…
Where documentation helps and where it cannot guarantee safety
Datasheets and model cards improve transparency, but they are not a substitute for rigorous testing, monitoring, or governance.
A well-written model card may reveal that a model performs poorly in certain conditions, but the document itself does not fix the problem. Likewise, a datasheet can disclose sampling biases without eliminating them. Documentation helps organisations recognise risks; it does not automatically mitigate them. [arXiv]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
There is also a risk of treating documentation as a compliance checklist rather than a meaningful governance tool. Researchers studying AI risk management have warned that documentation practices can become superficial if organisations focus on appearances rather than substantive evaluation and risk reduction. [arXiv]arxiv.orgEvolving AI Risk Management: A Maturity Model based on the NIST AI Risk Management FrameworkJanuary 26, 2024…
Another limitation is that documentation depends on truthful and complete reporting. A model card is only as useful as the evidence behind it. For this reason, many governance discussions increasingly emphasise audits, reproducible testing, and traceable records alongside documentation requirements. Emerging assurance approaches seek to supplement descriptive reports with stronger forms of verifiable evidence. [arXiv]arxiv.orgAI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) FrameworkNovember 16, 2025…
Even with these limitations, dataset datasheets and model cards remain among the most practical tools available for understanding an AI system before deployment. They make hidden assumptions visible, clarify intended uses, expose performance limits, and provide a structured basis for deciding whether a system is ready for real-world use. [arXiv+2arXiv]arxiv.orgarXiv Datasheets for DatasetsDatasheets for DatasetsMarch 23, 2018…
Amazon book picks
Further Reading
Books and field guides related to What AI labels should tell US before launch. Use these as the next step if you want deeper reading beyond the article.
Atlas of AI
Explains data origins, AI systems, transparency, accountability, and governance concerns behind documentation efforts.
Artificial Intelligence
Rating: 4.5/5 from 10 Google Books ratings
Provides foundational coverage of learning objectives and evaluation.
AI Snake Oil
Helps readers evaluate claims, limitations, testing results, and deployment risks that model cards seek to communicate.
Weapons of Math Destruction
Demonstrates why transparency and disclosure matter before automated systems affect people.
Endnotes
-
Source: arxiv.org
Title: arXiv Datasheets for Datasets
Link: https://arxiv.org/abs/1803.09010Source snippet
Datasheets for DatasetsMarch 23, 2018...
Published: March 23, 2018
-
Source: arxiv.org
Title: arXiv Model Cards for Model Reporting
Link: https://arxiv.org/abs/1810.03993 -
Source: nist.gov
Link: https://www.nist.gov/itl/ai-risk-management-frameworkSource snippet
AI Risk Management FrameworkNIST has developed a framework to better manage risks to individuals, organizations, and society associat...
-
Source: agora.eto.tech
Title: AGORANIST AI Risk Management Framework
Link: https://agora.eto.tech/instrument/772Source snippet
ETO AGORANIST AI Risk Management Framework - ETO AGORAAI risk measurements include documenting aspects of systems... MEASURE bolster AI...
-
Source: microsoft.com
Link: https://www.microsoft.com/en-us/research/wp-content/uploads/2019/01/1803.09010.pdfSource snippet
Datasheets for Datasetsby T Gebru · Cited by 4580 — The questions are divided into seven categories: motivation for dataset crea...
-
Source: dl.acm.org
Link: https://dl.acm.org/doi/10.1145/3287560.3287596Source snippet
ACM Digital LibraryModel Cards for Model Reporting | Proceedings of the...by M Mitchell · 2019 · Cited by 4162 — In this paper, we propo...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/1810.03993Source snippet
Model Cards for Model Reportingby M Mitchell · 2018 · Cited by 4162 — Model cards also disclose the context in which models are intended...
-
Source: mdsd4health.com
Title: Datasheets for Datasets
Link: https://www.mdsd4health.com/modules/module-3-mdsd-methods-mediums-pt-i/datasheets-for-datasetsSource snippet
Datasheets for Datasets are documents that disclose the motivation, composition, collection process, recommended uses of a dat...
-
Source: overleaf.com
Link: https://www.overleaf.com/latex/templates/datasheet-for-dataset-template/jgqyyzyprxthSource snippet
Datasheet for dataset templateDocument [the dataset] motivation, composition, collection process, recommended uses, and so on. [They] hav...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/1803.09010Source snippet
Datasheets for Datasetsby T Gebru · 2018 · Cited by 4541 — dataset be accompanied with a datasheet that documents its motivation, com- po...
-
Source: iapp.org
Title: 5 things to know about ai model cards
Link: https://iapp.org/news/a/5-things-to-know-about-ai-model-cardsSource snippet
23 Aug 2023 — Model cards are short documents provided with machine learning models that explain the context in which the models are inte...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/328189552_Model_Cards_for_Model_ReportingSource snippet
Model Cards for Model ReportingConcurrently, the proposal of Model cards are concise reports accompanying ML models that detail their int...
-
Source: practical-ai-act.eu
Link: https://practical-ai-act.eu/latest/engineering-practice/model-cards/Source snippet
Model cardsModel cards are a somewhat standardized form of documentation that provide a comprehensive overview of an AI model, including...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2401.15229Source snippet
Evolving AI Risk Management: A Maturity Model based on the NIST AI Risk Management FrameworkJanuary 26, 2024...
Published: January 26, 2024
-
Source: arxiv.org
Link: https://arxiv.org/abs/2511.12668Source snippet
AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) FrameworkNovember 16, 2025...
Published: November 16, 2025
-
Source: nist.gov
Link: https://www.nist.gov/Source snippet
National Institute of Standards and TechnologyNIST promotes U.S. innovation and industrial competitiveness by advancing measurement scien...
-
Source: researchgate.net
Title: 324055506 Datasheets for Datasets
Link: https://www.researchgate.net/publication/324055506_Datasheets_for_DatasetsSource snippet
Datasheets for Datasets3 May 2026 — We propose the concept of a datasheet for datasets, a short document to accompany public datasets, co...
Published: May 2026
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/386668632_Datasheets_for_DatasetsSource snippet
Datasheets for DatasetsBy analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, compositi...
-
Source: youtube.com
Title: Model Cards for Model Reporting
Link: https://www.youtube.com/watch?v=saAUB_MG2d0Source snippet
Datasheets for Datasets help ML engineers notice and understand ethical issues in training data...
-
Source: ainowinstitute.org
Title: datasheets for datasets
Link: https://ainowinstitute.org/publications/datasheets-for-datasetsSource snippet
22 Feb 2023 — every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended u...
-
Source: mlr3fairness.mlr-org.com
Title: mlr3 Fairness Datasheet for dataset “add dataset name here”
Link: https://mlr3fairness.mlr-org.com/articles/datasheet/datasheet.htmlSource snippet
mlr3 FairnessDatasheet for dataset “add dataset name here” - mlr3fairnessMotivation Composition Collection process Preprocessing/cleaning...
-
Source: aisecurityandsafety.org
Title: datasheets for datasets
Link: https://aisecurityandsafety.org/en/glossary/datasheets-for-datasets/Source snippet
AI Security & Safety DirectoryDatasheets for Datasets — AI Governance Definition & Guide27 Mar 2026 — Standardized documentation for mach...
-
Source: verifywise.ai
Link: https://verifywise.ai/de/ai-governance-library/transparency-and-documentation/model-cards-paperSource snippet
Model Cards for Model Reporting | KI-Governance-BibliothekModel cards provide standardized documentation covering intended uses, performa...
-
Source: alan-turing-institute.github.io
Title: Alan Turing Institute Model Cards
Link: https://alan-turing-institute.github.io/tea-techniques/techniques/model-cards/Source snippet
Alan Turing InstituteModel Cards - TEA TechniquesModel cards are standardised documentation frameworks that systematically document machi...
-
Source: emergentmind.com
Title: model cards for model reporting
Link: https://www.emergentmind.com/topics/model-cards-for-model-reportingSource snippet
Model Cards for Reporting AI Models18 Mar 2026 — They enable transparency and regulatory compliance by including sections on intended use...
-
Source: sentinelone.com
Title: nist ai risk management framework
Link: https://www.sentinelone.com/cybersecurity-101/cybersecurity/nist-ai-risk-management-framework/Source snippet
What is the NIST AI Risk Management Framework?Oct 14, 2025 — The NIST artificial intelligence risk management framework (AI RMF) guides o...
-
Source: github.com
Link: https://github.com/AudreyBeard/Datasheets-for-Datasets-Template/blob/master/refs.bibSource snippet
AudreyBeard/Datasheets-for-Datasets-Templateevery dataset be accompanied with a datasheet that documents its motivation, composition, col...
-
Source: ui.adsabs.harvard.edu
Link: https://ui.adsabs.harvard.edu/abs/2018arXiv180309010G/abstractSource snippet
for Datasets - ADSby T Gebru · 2018 · Cited by 4536 — We propose that every dataset be accompanied with a datasheet that documents its mo...
-
Source: edwinwenink.github.io
Title: model card
Link: https://edwinwenink.github.io/ai-ethics-tool-landscape/tools/model-card/Source snippet
s for Model Reporting13 Jul 2021 — Model cards are short documents accompanying trained machine learning models that provide benchmarked...
-
Source: info4940.infosci.cornell.edu
Title: model card
Link: https://info4940.infosci.cornell.edu/project/proj-01/model-card.htmlSource snippet
card9 Nov 2025 — It provides a summary of the model's performance and limitations, as well as the context in which it was trained and use...
-
Source: ai-solutions.daviesmeyer.com
Title: datasheets for datasets
Link: https://ai-solutions.daviesmeyer.com/en/glossary/datasheets-for-datasetsSource snippet
for Datasets Explained - HamburgStandardized documentation for ML datasets describing [provenance]({{ 'provenance/' | relative_url }}), composition, collection methods, recomm...
Additional References
-
Source: adeptiv.ai
Link: https://adeptiv.ai/nist-ai-rmf-guide-to-ai-risk-management-systems/Source snippet
AI Governance & Risk Management | NIST AI RMF GuideThe primary objective of the NIST AI RMF is to help organizations identify, assess, ma...
-
Source: merriam-webster.com
Link: https://www.merriam-webster.com/dictionary/modelSource snippet
MODEL Definition & Meaning1. a usually miniature representation of something; a plastic model of the human heart; also: a pattern of som...
-
Source: medium.com
Link: https://medium.com/%40tahirbalarabe2/model-cards-explained-b14cd7c9439eSource snippet
Model Cards Explained. Shoutout to Google | by TahirBy clearly stating intended use cases and out-of-scope scenarios, Model Cards help no...
-
Source: mbrenndoerfer.com
Link: https://mbrenndoerfer.com/writing/model-cards-documentation-intended-use-limitations-best-practicesSource snippet
Model Cards: Documentation, Intended Use, and LimitationsLearn how to write model cards that communicate intended use, training data, eva...
-
Source: domino.ai
Link: https://domino.ai/solutions/nist-risk-managementSource snippet
NIST AI risk management frameworkDomino Governance supports the NIST AI Risk Management Framework (RMF) standards with one universal syst...
-
Source: docs.modulos.ai
Link: https://docs.modulos.ai/frameworks/nist-ai-rmf/Source snippet
AI Risk Management Framework 1.0 (NIST AI RMF)Complete guide to the NIST AI Risk Management Framework 1.0 (AI RMF 1.0): the four core fun...
-
Source: morgan-klaus.com
Link: https://www.morgan-klaus.com/readings/datasheets-for-datasets.html -
Source: medium.com
Link: https://medium.com/%40akankshasinha247/model-cards-datasheets-governance-frameworks-0cda9605c94e -
Source: panaseer.com
Link: https://panaseer.com/resources/blog/delivering-responsible-ai-with-model-cards -
Source: ateam-oracle.com
Link: https://www.ateam-oracle.com/ciso-perspectives-a-practical-guide-to-implementing-the-nist-ai-risk-management-framework-ai-rmfSource snippet
The NIST AI RMF provides a structured approach for addressing risks related to AI...Read more...
Topic Tree



