Within Biased data
Why AI fairness cannot be one score
Fairness metrics can reveal uneven harm, but different measures often disagree about what a fair outcome should mean.
On this page
- Demographic parity, equal opportunity, and equalized odds
- Why average accuracy misses group level harm
- How organizations choose and justify tradeoffs
Page outline Jump by section
Introduction
Fairness metrics are tools used to evaluate whether an AI system distributes benefits, errors, and harms unevenly across different groups. They matter because overall accuracy can hide serious disparities: a model that appears highly accurate on average may still make disproportionately harmful mistakes for particular populations. Yet fairness assessment is not simply a matter of calculating a single score. Different fairness metrics embody different ideas about what counts as fair treatment, and those ideas can conflict with one another. As a result, organisations must make explicit governance choices about which forms of unfairness they are most concerned about and why. Researchers and standards bodies increasingly emphasise that fairness is not one universal property but a set of competing objectives that require context-sensitive judgement. NIST AI Resource Center+2NIST AI Resource Center [airc.nist.gov]airc.nist.govAI Resource Center MeasureNIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali…
Why average accuracy misses group-level harm
Machine-learning systems are usually trained to maximise predictive performance. A model can therefore achieve impressive headline accuracy while performing significantly worse for certain demographic groups.
Consider a screening system used for university admissions, lending, healthcare, or hiring. If one group experiences a much higher rate of false rejections than another, the average accuracy figure may remain high even though opportunities are being distributed unequally. Fairness metrics were developed to expose these hidden patterns by comparing outcomes across groups rather than looking only at aggregate performance. NIST’s AI Risk Management Framework recommends measuring disparities across groups and examining how harms are distributed, rather than relying solely on overall performance indicators. [NIST AI Resource Center+2EPIC]airc.nist.govAI Resource Center MeasureNIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali…
This shift reflects an important governance principle: a system can be statistically effective while still producing outcomes that many stakeholders would regard as unacceptable.
Demographic parity, equal opportunity, and equalized odds
Different fairness metrics focus on different kinds of equality. Understanding their differences helps explain why fairness debates often persist even after extensive measurement.
Demographic parity
Demographic parity, sometimes called statistical parity, asks whether different groups receive positive outcomes at the same rate. For example, if a hiring system recommends candidates from two demographic groups, demographic parity examines whether both groups are selected in roughly equal proportions. [dida+2Latitude]dida.dofairness in mlFairness in Machine Learning | dida blog3 Jan 2024 — This notion defines fairness as the probability of a given prediction being equa…
Its strength is simplicity. It directly highlights unequal allocation of opportunities or resources. However, it does not consider whether individuals in each group actually met the underlying qualification criteria. A model could satisfy demographic parity while still making many incorrect decisions. [MIT OpenCourseWare]ocw.mit.eduContent. Confusion matrix.Read moreMIT OpenCourseWareFairness Criteria | Exploring Fairness in Machine Learning…Discuss how to choose between different fairness criteria…
Equal opportunity
Equal opportunity focuses on qualified individuals. It requires that people who genuinely deserve a positive outcome have the same chance of receiving it regardless of group membership. Technically, it seeks parity in true positive rates across groups. [Google for Developers]developers.google.comequality of opportunityGoogle for DevelopersFairness: Equality of opportunity | Machine Learning3 Dec 2025 — Learn how to use the equality of opportunity metric…
This metric is attractive in domains where missing qualified candidates is especially harmful. For example, a recruitment system might be judged fair if equally qualified applicants from different groups have equal chances of being recommended. However, equal opportunity pays less attention to false positive errors, meaning groups could still experience different rates of incorrect approvals. [Google for Developers]developers.google.comequality of opportunityGoogle for DevelopersFairness: Equality of opportunity | Machine Learning3 Dec 2025 — Learn how to use the equality of opportunity metric…
Equalized odds
Equalized odds goes further by requiring both true positive rates and false positive rates to be similar across groups. In effect, it asks whether the model makes successful and unsuccessful predictions at comparable rates for everyone. [PMC]pmc.ncbi.nlm.nih.govPMCAlgorithmic fairness in computational medicineUnlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but…
Because it examines both major error types, equalized odds is often considered a stronger fairness requirement than equal opportunity. Yet achieving it can require sacrificing other goals, including some forms of predictive performance or other fairness measures. [PMC]pmc.ncbi.nlm.nih.govPMCAlgorithmic fairness in computational medicineUnlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but…
Why fairness metrics often disagree
One of the most important discoveries in fairness research is that many desirable fairness properties cannot all be satisfied simultaneously.
A major source of conflict arises when groups have different underlying outcome rates, often called different base rates. In such situations, mathematical results from researchers including Alexandra Chouldechova and the team of Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan show that several widely used fairness criteria become incompatible. A system may satisfy one fairness metric only by violating another. [ResearchGate+2Hexdocs]researchgate.netFair prediction with disparate impact: A study of bias in…Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2…
For example:
- Demographic parity may require equal selection rates across groups. [ispartnersllc.com]ispartnersllc.comNIST AI RMF Principle: MeasureBias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (si… * Equalized odds may require equal error rates across groups. [ispartnersllc.com]ispartnersllc.comNIST AI RMF Principle: MeasureBias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (si…
- Predictive parity or calibration may require equal meaning of risk scores across groups.
When base rates differ, achieving all of these goals at the same time is generally impossible except in special circumstances. [arXiv+3ResearchGate+3Hexdocs]researchgate.netFair prediction with disparate impact: A study of bias in…Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2…
This result is sometimes described as the fairness impossibility theorem. Its practical implication is significant: fairness debates are often not disagreements about mathematics but disagreements about which definition of fairness should take priority in a particular context. [arXiv]arxiv.orgOpen source on arxiv.org.
The COMPAS debate as a real-world example
The debate surrounding the COMPAS recidivism prediction system became a landmark illustration of competing fairness metrics. Researchers and journalists examining the system found evidence of racial disparities in some error rates, while defenders pointed out that the system exhibited forms of calibration across groups. [ProPublica+2Allen Downey]propublica.orgto discover the underlying accuracy of their recidivism algorithm.Read moreHow We Analyzed the COMPAS Recidivism Algorithm23 May 2016 — We set out to assess one of the commercial tools made by Northpoin…
The disagreement persisted partly because different parties emphasised different fairness criteria:
- Critics focused on unequal false positive and false negative rates.
- Defenders highlighted calibration and predictive consistency across groups.
- Both sides could support their claims using recognised fairness measures. [Allen Downey+2arXiv]allendowney.github.ioAllen DowneyAlgorithmic Fairness — Recidivism Case StudyCOMPAS is calibrated in the sense that White and Black defendants with the same r…
The lesson was not that one side misunderstood the statistics. Rather, the case demonstrated that different fairness metrics capture different moral and policy concerns. A system can appear fair according to one metric while appearing unfair according to another. [arXiv+2CSE Department]arxiv.orgthe impossibility theorem of machine fairnessby KK Saravanakumar · 2020 · Cited by 30 — This report led to three metrics of fairness…
How organisations choose and justify tradeoffs
Because fairness metrics can conflict, governance becomes central. The key question is no longer simply whether a model is fair, but which fairness objective should be prioritised and why.
NIST guidance recommends using fairness metrics alongside context-specific assessments of harm and stakeholder impacts rather than relying on a single universal measure. Organisations are encouraged to examine disparities across groups, consider the consequences of different error types, and engage affected communities when defining acceptable outcomes. NIST AI Resource Center+2NIST AI Resource Center [airc.nist.gov]airc.nist.govAI Resource Center MeasureNIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali…
The appropriate tradeoff often depends on the application:
- In medical diagnosis, missing a serious illness may be considered more harmful than issuing an unnecessary follow-up test, making equal opportunity especially relevant.
- In criminal justice, unequal false positive rates may raise concerns about disproportionate burdens on particular groups, making equalized odds more attractive.
- In resource allocation programmes, demographic parity may be emphasised when equitable access is the primary objective. [PMC+2MIT OpenCourseWare]pmc.ncbi.nlm.nih.govPMCAlgorithmic fairness in computational medicineUnlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but…
Good governance therefore requires documenting which fairness metric was chosen, what harms it is intended to address, what tradeoffs it creates, and how those decisions align with legal, ethical, and organisational responsibilities. Fairness measurement becomes less a search for a perfect score and more a process of making value-laden choices transparent and accountable. [Scrut+2NIST AI Resource Center]scrut.ioFairness and bias (NIST AI RMF)Learn how the NIST AI RMF addresses unfair or discriminatory outcomes through data quality, testing…
Why AI fairness cannot be one score
The central insight of fairness metrics is that unfairness can take multiple forms. One community may be concerned about unequal access to opportunities, another about unequal error rates, and another about whether risk scores mean the same thing for everyone. Each concern leads to a different metric.
As fairness research has matured, the field has moved away from the idea that a single numerical score can certify an AI system as fair. Instead, fairness evaluation is increasingly understood as a governance exercise that combines technical measurement with human judgement about which harms matter most in a particular setting. The metrics reveal disparities, but they do not decide which tradeoffs society should accept. [Nature+2NIST AI Resource Center]nature.comA clarification of the nuances in the fairness metrics…by A Castelnovo · 2022 · Cited by 392 — A plethora of different definitio…
Amazon book picks
Further Reading
Books and field guides related to Why AI fairness cannot be one score. Use these as the next step if you want deeper reading beyond the article.
Fairness and Machine Learning
Directly covers fairness metrics and competing definitions of fairness.
Weapons of Math Destruction
Provides real-world motivation for measuring unfair outcomes.
Endnotes
-
Source: airc.nist.gov
Title: AI Resource Center Measure
Link: https://airc.nist.gov/airmf-resources/playbook/measure/Source snippet
NIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali...
-
Source: airc.nist.gov
Title: AI Resource Center AI Risks and Trustworthiness
Link: https://airc.nist.gov/airmf-resources/airmf/3-sec-characteristics/Source snippet
Risks and Trustworthiness - AIRCFairness in AI includes concerns for equality and equity by addressing issues such as harmful bias and di...
-
Source: nature.com
Link: https://www.nature.com/articles/s41598-022-07939-1Source snippet
A clarification of the nuances in the fairness metrics...by A Castelnovo · 2022 · Cited by 392 — A plethora of different definitio...
-
Source: epic.org
Link: https://epic.org/framing-the-risk-management-framework-actionable-instructions-by-nist-in-the-measure-section-of-the-ai-rmf/Source snippet
erence, equal opportunity difference, average absolute odds...Read more...
-
Source: nvlpubs.nist.gov
Title: AI.600 1
Link: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdfSource snippet
NIST PublicationsArtificial Intelligence Risk Management Frameworkby N AI · 2024 · Cited by 78 — on GAI, apply general fairness metrics (...
-
Source: dida.do
Title: fairness in ml
Link: https://dida.do/blog/fairness-in-mlSource snippet
Fairness in Machine Learning | dida blog3 Jan 2024 — This notion defines fairness as the probability of a given prediction being equa...
-
Source: latitude.so
Link: https://latitude.so/blog/how-to-compare-fairness-metrics-for-model-selectionSource snippet
How to Compare Fairness Metrics for Model Selection17 Feb 2025 — Demographic Parity: Measures selection rate differences between...
-
Source: ocw.mit.edu
Title: Content. Confusion matrix.Read more
Link: https://ocw.mit.edu/courses/res-ec-001-exploring-fairness-in-machine-learning-for-international-development-spring-2020/pages/module-three-framework/fairness-criteria/Source snippet
MIT OpenCourseWareFairness Criteria | Exploring Fairness in Machine Learning...Discuss how to choose between different fairness criteria...
-
Source: developers.google.com
Title: equality of opportunity
Link: https://developers.google.com/machine-learning/crash-course/fairness/equality-of-opportunitySource snippet
Google for DevelopersFairness: Equality of opportunity | Machine Learning3 Dec 2025 — Learn how to use the equality of opportunity metric...
-
Source: pmc.ncbi.nlm.nih.gov
Title: PMCAlgorithmic fairness in computational medicine
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC9463525/Source snippet
Unlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but...
-
Source: arxiv.org
Title: arXiv Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds
Link: https://arxiv.org/abs/2405.07393 -
Source: researchgate.net
Link: https://www.researchgate.net/publication/314153531_Fair_prediction_with_disparate_impact_A_study_of_bias_in_recidivism_prediction_instrumentsSource snippet
Fair prediction with disparate impact: A study of bias in...Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2...
-
Source: hexdocs.pm
Link: https://hexdocs.pm/ex_fairness/Source snippet
README — ExFairness v0.5.1Demographic parity and equalized odds are often in conflict; Predictive parity and equalized odds cannot both b...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2208.12606 -
Source: arxiv.org
Title: arXiv The Impossibility Theorem of Machine Fairness – A Causal Perspective
Link: https://arxiv.org/abs/2007.06024 -
Source: propublica.org
Title: to discover the underlying accuracy of their recidivism algorithm.Read more
Link: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithmSource snippet
How We Analyzed the COMPAS Recidivism Algorithm23 May 2016 — We set out to assess one of the commercial tools made by Northpoin...
Published: May 2016
-
Source: hdsr.mitpress.mit.edu
Link: https://hdsr.mitpress.mit.edu/pub/7z10o269Source snippet
Harvard Data Science ReviewThe Age of Secrecy and Unfairness in Recidivism Predictionby C Rudin · 2020 · Cited by 327 — COMPAS is used th...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2007.06024Source snippet
the impossibility theorem of machine fairnessby KK Saravanakumar · 2020 · Cited by 30 — This report led to three metrics of fairness...
-
Source: arxiv.org
Link: https://arxiv.org/abs/1906.04711Source snippet
This paper takes a...
-
Source: scrut.io
Link: https://www.scrut.io/glossary/fairness-and-bias-nist-ai-rmfSource snippet
Fairness and bias (NIST AI RMF)Learn how the NIST AI RMF addresses unfair or discriminatory outcomes through data quality, testing...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/1906.04711Source snippet
COMPAS data is used in an increasing number of studies to test various definitions of algorithmic fairness. This paper takes a...Read more...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2501.01889Source snippet
An Investigation into Custom [Loss Functions]({{ 'loss-functions/' | relative_url }}) for Fairness...by G Lee · 2025 · Cited by 1 — This paper explores the complex tradeoffs betw...
-
Source: nvlpubs.nist.gov
Link: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdfSource snippet
a Standard for Identifying and Managing Bias in...by R Schwartz · 2022 · Cited by 761 — This document, and work by the National Institut...
-
Source: nist.gov
Title: theres more ai bias biased data nist report highlights
Link: https://www.nist.gov/news-events/news/2022/03/theres-more-ai-bias-biased-data-nist-report-highlightsSource snippet
There's More to AI Bias Than Biased Data, NIST Report...Mar 16, 2022 — The NIST report acknowledges that a great deal of AI bias stems f...
-
Source: developers.google.com
Title: responsible ai
Link: https://developers.google.com/machine-learning/glossary/responsible-aiSource snippet
Learning Glossary: Responsible AIApr 13, 2026 — A mathematical definition of "fairness" that is measurable. Some commonly used fairness m...
-
Source: allendowney.github.io
Link: https://allendowney.github.io/RecidivismCaseStudy/02_calibration.htmlSource snippet
Allen DowneyAlgorithmic Fairness — Recidivism Case StudyCOMPAS is calibrated in the sense that White and Black defendants with the same r...
-
Source: cse.buffalo.edu
Link: https://cse.buffalo.edu/~kjoseph/mlsoc/support/notes/compas/index.htmlSource snippet
CSE DepartmentAlgorithms and Society Notes: COMPAS datasetThe ProPublica article is considered a landmark article especially within folks...
-
Source: mbrenndoerfer.com
Title: fairness metrics
Link: https://mbrenndoerfer.com/writing/fairness-metricsSource snippet
Demographic Parity, Equalized Odds...16 Mar 2026 — Learn the key mathematical definitions of algorithmic fairness, from demographic pari...
-
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12436242/Source snippet
Defining Fairness in Machine Learning for Healthby J Gao · 2025 · Cited by 15 — Additional extensions of group fairness criteria, includi...
-
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12823528/Source snippet
in AI systems: integrating formal and socio-technical...by A Ahmad · 2026 — The fairness test can be: (i) Group fairness, which checks...
-
Source: urbanspatial.github.io
Link: https://urbanspatial.github.io/PublicPolicyAnalytics/people-based-ml-models-algorithmic-fairness.htmlSource snippet
Chapter 7 People-Based ML Models: Algorithmic FairnessThis paradox lead ProPublica to ask a fascinating question - “how could an algorith...
-
Source: afraenkel.github.io
Title: 05 parity measures
Link: https://afraenkel.github.io/fairness-book/content/05-parity-measures.htmlSource snippet
Parity Measures — Fairness & Algorithmic Decision...If A and C are not independent of Y, then Demographic Parity and Equalized Odds Pari...
Additional References
-
Source: swept.ai
Link: https://www.swept.ai/ai-bias-fairnessSource snippet
AI Bias and Fairness: Detection, Metrics & MitigationAI bias creates unfair outcomes across protected groups. Learn how to detect bias in...
-
Source: codesignal.com
Link: https://codesignal.com/learn/courses/privacy-bias-and-fairness-in-ai/lessons/fairness-frameworksSource snippet
Fairness Frameworks | CodeSignal LearnThis dialogue highlights the importance of using fairness metrics like demographic parity and equal...
-
Source: ieiespc.org
Link: https://ieiespc.org/ieiespc/XmlViewer/f437628Source snippet
Fairness Measure, Bias Mitigation Techniques and...AIF360 offers more than 70 fairness metrics and 14 bias mitigation algorithms designe...
-
Source: ruivieira.dev
Link: https://ruivieira.dev/fairness-in-machine-learning.htmlSource snippet
Fairness in Machine LearningCounterfactual Fairness. Demographic parity: This metric assesses whether the probability of a positive outco...
-
Source: ispartnersllc.com
Link: https://www.ispartnersllc.com/hubs/nist-ai-rmf/measure/Source snippet
NIST AI RMF Principle: MeasureBias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (si...
-
Source: hyperproof.io
Link: https://hyperproof.io/navigating-the-nist-ai-risk-management-framework/Source snippet
Navigating the NIST AI Risk Management FrameworkFairness “in AI includes concerns for equality and equity by addressing issues such as ha...
-
Source: montrealethics.ai
Link: https://montrealethics.ai/its-compaslicated-the-messy-relationship-between-rai-datasets-and-algorithmic-fairness-[benchmarksSource snippet
It's COMPASlicated: The Messy Relationship between RAI...2 Mar 2022 — In this paper, the authors first overview many of the issues surro...
-
Source: unece.org
Link: https://unece.org/sites/default/files/2025-10/Companion%20Paper%20on%20Fairness%20in%20Machine%20Learning_Responsible%20AI%20Framework_ADSaMM%20group.pdf -
Source: hrlr.law.columbia.edu
Title: reprogramming fairness affirmative action in algorithmic criminal sentencing
Link: https://hrlr.law.columbia.edu/hrlr-online/reprogramming-fairness-affirmative-action-in-algorithmic-criminal-sentencing/Source snippet
Action in Algorithmic Criminal Sentencing15 Apr 2020 — In 2016, the investigative journalism non-profit, ProPublica, alleged that a popul...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=VMBE2mgpxH8
Topic Tree



