Within Biased data

Why AI fairness cannot be one score

Fairness metrics can reveal uneven harm, but different measures often disagree about what a fair outcome should mean.

On this page

  • Demographic parity, equal opportunity, and equalized odds
  • Why average accuracy misses group level harm
  • How organizations choose and justify tradeoffs
Preview for Why AI fairness cannot be one score

Introduction

Fairness metrics are tools used to evaluate whether an AI system distributes benefits, errors, and harms unevenly across different groups. They matter because overall accuracy can hide serious disparities: a model that appears highly accurate on average may still make disproportionately harmful mistakes for particular populations. Yet fairness assessment is not simply a matter of calculating a single score. Different fairness metrics embody different ideas about what counts as fair treatment, and those ideas can conflict with one another. As a result, organisations must make explicit governance choices about which forms of unfairness they are most concerned about and why. Researchers and standards bodies increasingly emphasise that fairness is not one universal property but a set of competing objectives that require context-sensitive judgement. NIST AI Resource Center+2NIST AI Resource Center [airc.nist.gov]airc.nist.govAI Resource Center MeasureNIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali…

Fairness metrics illustration 1

Why average accuracy misses group-level harm

Machine-learning systems are usually trained to maximise predictive performance. A model can therefore achieve impressive headline accuracy while performing significantly worse for certain demographic groups.

Consider a screening system used for university admissions, lending, healthcare, or hiring. If one group experiences a much higher rate of false rejections than another, the average accuracy figure may remain high even though opportunities are being distributed unequally. Fairness metrics were developed to expose these hidden patterns by comparing outcomes across groups rather than looking only at aggregate performance. NIST’s AI Risk Management Framework recommends measuring disparities across groups and examining how harms are distributed, rather than relying solely on overall performance indicators. [NIST AI Resource Center+2EPIC]airc.nist.govAI Resource Center MeasureNIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali…

This shift reflects an important governance principle: a system can be statistically effective while still producing outcomes that many stakeholders would regard as unacceptable.

Demographic parity, equal opportunity, and equalized odds

Different fairness metrics focus on different kinds of equality. Understanding their differences helps explain why fairness debates often persist even after extensive measurement.

Demographic parity

Demographic parity, sometimes called statistical parity, asks whether different groups receive positive outcomes at the same rate. For example, if a hiring system recommends candidates from two demographic groups, demographic parity examines whether both groups are selected in roughly equal proportions. [dida+2Latitude]dida.dofairness in mlFairness in Machine Learning | dida blog3 Jan 2024 — This notion defines fairness as the probability of a given prediction being equa…

Its strength is simplicity. It directly highlights unequal allocation of opportunities or resources. However, it does not consider whether individuals in each group actually met the underlying qualification criteria. A model could satisfy demographic parity while still making many incorrect decisions. [MIT OpenCourseWare]ocw.mit.eduContent. Confusion matrix.Read moreMIT OpenCourseWareFairness Criteria | Exploring Fairness in Machine Learning…Discuss how to choose between different fairness criteria…

Equal opportunity

Equal opportunity focuses on qualified individuals. It requires that people who genuinely deserve a positive outcome have the same chance of receiving it regardless of group membership. Technically, it seeks parity in true positive rates across groups. [Google for Developers]developers.google.comequality of opportunityGoogle for DevelopersFairness: Equality of opportunity | Machine Learning3 Dec 2025 — Learn how to use the equality of opportunity metric…

This metric is attractive in domains where missing qualified candidates is especially harmful. For example, a recruitment system might be judged fair if equally qualified applicants from different groups have equal chances of being recommended. However, equal opportunity pays less attention to false positive errors, meaning groups could still experience different rates of incorrect approvals. [Google for Developers]developers.google.comequality of opportunityGoogle for DevelopersFairness: Equality of opportunity | Machine Learning3 Dec 2025 — Learn how to use the equality of opportunity metric…

Equalized odds

Equalized odds goes further by requiring both true positive rates and false positive rates to be similar across groups. In effect, it asks whether the model makes successful and unsuccessful predictions at comparable rates for everyone. [PMC]pmc.ncbi.nlm.nih.govPMCAlgorithmic fairness in computational medicineUnlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but…

Because it examines both major error types, equalized odds is often considered a stronger fairness requirement than equal opportunity. Yet achieving it can require sacrificing other goals, including some forms of predictive performance or other fairness measures. [PMC]pmc.ncbi.nlm.nih.govPMCAlgorithmic fairness in computational medicineUnlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but…

Fairness metrics illustration 2

Why fairness metrics often disagree

One of the most important discoveries in fairness research is that many desirable fairness properties cannot all be satisfied simultaneously.

A major source of conflict arises when groups have different underlying outcome rates, often called different base rates. In such situations, mathematical results from researchers including Alexandra Chouldechova and the team of Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan show that several widely used fairness criteria become incompatible. A system may satisfy one fairness metric only by violating another. [ResearchGate+2Hexdocs]researchgate.netFair prediction with disparate impact: A study of bias in…Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2…

For example:

  • Demographic parity may require equal selection rates across groups. [ispartnersllc.com]ispartnersllc.comNIST AI RMF Principle: MeasureBias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (si… * Equalized odds may require equal error rates across groups. [ispartnersllc.com]ispartnersllc.comNIST AI RMF Principle: MeasureBias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (si…
  • Predictive parity or calibration may require equal meaning of risk scores across groups.

When base rates differ, achieving all of these goals at the same time is generally impossible except in special circumstances. [arXiv+3ResearchGate+3Hexdocs]researchgate.netFair prediction with disparate impact: A study of bias in…Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2…

This result is sometimes described as the fairness impossibility theorem. Its practical implication is significant: fairness debates are often not disagreements about mathematics but disagreements about which definition of fairness should take priority in a particular context. [arXiv]arxiv.orgOpen source on arxiv.org.

The COMPAS debate as a real-world example

The debate surrounding the COMPAS recidivism prediction system became a landmark illustration of competing fairness metrics. Researchers and journalists examining the system found evidence of racial disparities in some error rates, while defenders pointed out that the system exhibited forms of calibration across groups. [ProPublica+2Allen Downey]propublica.orgto discover the underlying accuracy of their recidivism algorithm.Read moreHow We Analyzed the COMPAS Recidivism Algorithm23 May 2016 — We set out to assess one of the commercial tools made by Northpoin…Published: May 2016

The disagreement persisted partly because different parties emphasised different fairness criteria:

  • Critics focused on unequal false positive and false negative rates.
  • Defenders highlighted calibration and predictive consistency across groups.
  • Both sides could support their claims using recognised fairness measures. [Allen Downey+2arXiv]allendowney.github.ioAllen DowneyAlgorithmic Fairness — Recidivism Case StudyCOMPAS is calibrated in the sense that White and Black defendants with the same r…

The lesson was not that one side misunderstood the statistics. Rather, the case demonstrated that different fairness metrics capture different moral and policy concerns. A system can appear fair according to one metric while appearing unfair according to another. [arXiv+2CSE Department]arxiv.orgthe impossibility theorem of machine fairnessby KK Saravanakumar · 2020 · Cited by 30 — This report led to three metrics of fairness…

Fairness metrics illustration 3

How organisations choose and justify tradeoffs

Because fairness metrics can conflict, governance becomes central. The key question is no longer simply whether a model is fair, but which fairness objective should be prioritised and why.

NIST guidance recommends using fairness metrics alongside context-specific assessments of harm and stakeholder impacts rather than relying on a single universal measure. Organisations are encouraged to examine disparities across groups, consider the consequences of different error types, and engage affected communities when defining acceptable outcomes. NIST AI Resource Center+2NIST AI Resource Center [airc.nist.gov]airc.nist.govAI Resource Center MeasureNIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali…

The appropriate tradeoff often depends on the application:

  • In medical diagnosis, missing a serious illness may be considered more harmful than issuing an unnecessary follow-up test, making equal opportunity especially relevant.
  • In criminal justice, unequal false positive rates may raise concerns about disproportionate burdens on particular groups, making equalized odds more attractive.
  • In resource allocation programmes, demographic parity may be emphasised when equitable access is the primary objective. [PMC+2MIT OpenCourseWare]pmc.ncbi.nlm.nih.govPMCAlgorithmic fairness in computational medicineUnlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but…

Good governance therefore requires documenting which fairness metric was chosen, what harms it is intended to address, what tradeoffs it creates, and how those decisions align with legal, ethical, and organisational responsibilities. Fairness measurement becomes less a search for a perfect score and more a process of making value-laden choices transparent and accountable. [Scrut+2NIST AI Resource Center]scrut.ioFairness and bias (NIST AI RMF)Learn how the NIST AI RMF addresses unfair or discriminatory outcomes through data quality, testing…

Why AI fairness cannot be one score

The central insight of fairness metrics is that unfairness can take multiple forms. One community may be concerned about unequal access to opportunities, another about unequal error rates, and another about whether risk scores mean the same thing for everyone. Each concern leads to a different metric.

As fairness research has matured, the field has moved away from the idea that a single numerical score can certify an AI system as fair. Instead, fairness evaluation is increasingly understood as a governance exercise that combines technical measurement with human judgement about which harms matter most in a particular setting. The metrics reveal disparities, but they do not decide which tradeoffs society should accept. [Nature+2NIST AI Resource Center]nature.comA clarification of the nuances in the fairness metrics…by A Castelnovo · 2022 · Cited by 392 — A plethora of different definitio…

Amazon book picks

Further Reading

Books and field guides related to Why AI fairness cannot be one score. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: airc.nist.gov
    Title: AI Resource Center Measure
    Link: https://airc.nist.gov/airmf-resources/playbook/measure/
    Source snippet

    NIST AI Resource CenterMeasure - AIRCQuantify harms using both a general fairness metric, if appropriate (e.g. demographic parity, equali...

  2. Source: airc.nist.gov
    Title: AI Resource Center AI Risks and Trustworthiness
    Link: https://airc.nist.gov/airmf-resources/airmf/3-sec-characteristics/
    Source snippet

    Risks and Trustworthiness - AIRCFairness in AI includes concerns for equality and equity by addressing issues such as harmful bias and di...

  3. Source: nature.com
    Link: https://www.nature.com/articles/s41598-022-07939-1
    Source snippet

    A clarification of the nuances in the fairness metrics...by A Castelnovo · 2022 · Cited by 392 — A plethora of different definitio...

  4. Source: epic.org
    Link: https://epic.org/framing-the-risk-management-framework-actionable-instructions-by-nist-in-the-measure-section-of-the-ai-rmf/
    Source snippet

    erence, equal opportunity difference, average absolute odds...Read more...

  5. Source: nvlpubs.nist.gov
    Title: AI.600 1
    Link: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
    Source snippet

    NIST PublicationsArtificial Intelligence Risk Management Frameworkby N AI · 2024 · Cited by 78 — on GAI, apply general fairness metrics (...

  6. Source: dida.do
    Title: fairness in ml
    Link: https://dida.do/blog/fairness-in-ml
    Source snippet

    Fairness in Machine Learning | dida blog3 Jan 2024 — This notion defines fairness as the probability of a given prediction being equa...

  7. Source: latitude.so
    Link: https://latitude.so/blog/how-to-compare-fairness-metrics-for-model-selection
    Source snippet

    How to Compare Fairness Metrics for Model Selection17 Feb 2025 — Demographic Parity: Measures selection rate differences between...

  8. Source: ocw.mit.edu
    Title: Content. Confusion matrix.Read more
    Link: https://ocw.mit.edu/courses/res-ec-001-exploring-fairness-in-machine-learning-for-international-development-spring-2020/pages/module-three-framework/fairness-criteria/
    Source snippet

    MIT OpenCourseWareFairness Criteria | Exploring Fairness in Machine Learning...Discuss how to choose between different fairness criteria...

  9. Source: developers.google.com
    Title: equality of opportunity
    Link: https://developers.google.com/machine-learning/crash-course/fairness/equality-of-opportunity
    Source snippet

    Google for DevelopersFairness: Equality of opportunity | Machine Learning3 Dec 2025 — Learn how to use the equality of opportunity metric...

  10. Source: pmc.ncbi.nlm.nih.gov
    Title: PMCAlgorithmic fairness in computational medicine
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC9463525/
    Source snippet

    Unlike demographic parity, equalized odds is a definition of Fairness that allows the prediction to depend on protected attribute, but...

  11. Source: arxiv.org
    Title: arXiv Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds
    Link: https://arxiv.org/abs/2405.07393

  12. Source: researchgate.net
    Link: https://www.researchgate.net/publication/314153531_Fair_prediction_with_disparate_impact_A_study_of_bias_in_recidivism_prediction_instruments
    Source snippet

    Fair prediction with disparate impact: A study of bias in...Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2...

  13. Source: hexdocs.pm
    Link: https://hexdocs.pm/ex_fairness/
    Source snippet

    README — ExFairness v0.5.1Demographic parity and equalized odds are often in conflict; Predictive parity and equalized odds cannot both b...

  14. Source: arxiv.org
    Link: https://arxiv.org/abs/2208.12606

  15. Source: arxiv.org
    Title: arXiv The Impossibility Theorem of Machine Fairness – A Causal Perspective
    Link: https://arxiv.org/abs/2007.06024

  16. Source: propublica.org
    Title: to discover the underlying accuracy of their recidivism algorithm.Read more
    Link: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
    Source snippet

    How We Analyzed the COMPAS Recidivism Algorithm23 May 2016 — We set out to assess one of the commercial tools made by Northpoin...

    Published: May 2016

  17. Source: hdsr.mitpress.mit.edu
    Link: https://hdsr.mitpress.mit.edu/pub/7z10o269
    Source snippet

    Harvard Data Science ReviewThe Age of Secrecy and Unfairness in Recidivism Predictionby C Rudin · 2020 · Cited by 327 — COMPAS is used th...

  18. Source: arxiv.org
    Link: https://arxiv.org/pdf/2007.06024
    Source snippet

    the impossibility theorem of machine fairnessby KK Saravanakumar · 2020 · Cited by 30 — This report led to three metrics of fairness...

  19. Source: arxiv.org
    Link: https://arxiv.org/abs/1906.04711
    Source snippet

    This paper takes a...

  20. Source: scrut.io
    Link: https://www.scrut.io/glossary/fairness-and-bias-nist-ai-rmf
    Source snippet

    Fairness and bias (NIST AI RMF)Learn how the NIST AI RMF addresses unfair or discriminatory outcomes through data quality, testing...

  21. Source: arxiv.org
    Link: https://arxiv.org/pdf/1906.04711
    Source snippet

    COMPAS data is used in an increasing number of studies to test various definitions of algorithmic fairness. This paper takes a...Read more...

  22. Source: arxiv.org
    Link: https://arxiv.org/pdf/2501.01889
    Source snippet

    An Investigation into Custom [Loss Functions]({{ 'loss-functions/' | relative_url }}) for Fairness...by G Lee · 2025 · Cited by 1 — This paper explores the complex tradeoffs betw...

  23. Source: nvlpubs.nist.gov
    Link: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf
    Source snippet

    a Standard for Identifying and Managing Bias in...by R Schwartz · 2022 · Cited by 761 — This document, and work by the National Institut...

  24. Source: nist.gov
    Title: theres more ai bias biased data nist report highlights
    Link: https://www.nist.gov/news-events/news/2022/03/theres-more-ai-bias-biased-data-nist-report-highlights
    Source snippet

    There's More to AI Bias Than Biased Data, NIST Report...Mar 16, 2022 — The NIST report acknowledges that a great deal of AI bias stems f...

  25. Source: developers.google.com
    Title: responsible ai
    Link: https://developers.google.com/machine-learning/glossary/responsible-ai
    Source snippet

    Learning Glossary: Responsible AIApr 13, 2026 — A mathematical definition of "fairness" that is measurable. Some commonly used fairness m...

  26. Source: allendowney.github.io
    Link: https://allendowney.github.io/RecidivismCaseStudy/02_calibration.html
    Source snippet

    Allen DowneyAlgorithmic Fairness — Recidivism Case StudyCOMPAS is calibrated in the sense that White and Black defendants with the same r...

  27. Source: cse.buffalo.edu
    Link: https://cse.buffalo.edu/~kjoseph/mlsoc/support/notes/compas/index.html
    Source snippet

    CSE DepartmentAlgorithms and Society Notes: COMPAS datasetThe ProPublica article is considered a landmark article especially within folks...

  28. Source: mbrenndoerfer.com
    Title: fairness metrics
    Link: https://mbrenndoerfer.com/writing/fairness-metrics
    Source snippet

    Demographic Parity, Equalized Odds...16 Mar 2026 — Learn the key mathematical definitions of algorithmic fairness, from demographic pari...

  29. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12436242/
    Source snippet

    Defining Fairness in Machine Learning for Healthby J Gao · 2025 · Cited by 15 — Additional extensions of group fairness criteria, includi...

  30. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12823528/
    Source snippet

    in AI systems: integrating formal and socio-technical...by A Ahmad · 2026 — The fairness test can be: (i) Group fairness, which checks...

  31. Source: urbanspatial.github.io
    Link: https://urbanspatial.github.io/PublicPolicyAnalytics/people-based-ml-models-algorithmic-fairness.html
    Source snippet

    Chapter 7 People-Based ML Models: Algorithmic FairnessThis paradox lead ProPublica to ask a fascinating question - “how could an algorith...

  32. Source: afraenkel.github.io
    Title: 05 parity measures
    Link: https://afraenkel.github.io/fairness-book/content/05-parity-measures.html
    Source snippet

    Parity Measures — Fairness & Algorithmic Decision...If A and C are not independent of Y, then Demographic Parity and Equalized Odds Pari...

Additional References

  1. Source: swept.ai
    Link: https://www.swept.ai/ai-bias-fairness
    Source snippet

    AI Bias and Fairness: Detection, Metrics & MitigationAI bias creates unfair outcomes across protected groups. Learn how to detect bias in...

  2. Source: codesignal.com
    Link: https://codesignal.com/learn/courses/privacy-bias-and-fairness-in-ai/lessons/fairness-frameworks
    Source snippet

    Fairness Frameworks | CodeSignal LearnThis dialogue highlights the importance of using fairness metrics like demographic parity and equal...

  3. Source: ieiespc.org
    Link: https://ieiespc.org/ieiespc/XmlViewer/f437628
    Source snippet

    Fairness Measure, Bias Mitigation Techniques and...AIF360 offers more than 70 fairness metrics and 14 bias mitigation algorithms designe...

  4. Source: ruivieira.dev
    Link: https://ruivieira.dev/fairness-in-machine-learning.html
    Source snippet

    Fairness in Machine LearningCounterfactual Fairness. Demographic parity: This metric assesses whether the probability of a positive outco...

  5. Source: ispartnersllc.com
    Link: https://www.ispartnersllc.com/hubs/nist-ai-rmf/measure/
    Source snippet

    NIST AI RMF Principle: MeasureBias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (si...

  6. Source: hyperproof.io
    Link: https://hyperproof.io/navigating-the-nist-ai-risk-management-framework/
    Source snippet

    Navigating the NIST AI Risk Management FrameworkFairness “in AI includes concerns for equality and equity by addressing issues such as ha...

  7. Source: montrealethics.ai
    Link: https://montrealethics.ai/its-compaslicated-the-messy-relationship-between-rai-datasets-and-algorithmic-fairness-[benchmarks
    Source snippet

    It's COMPASlicated: The Messy Relationship between RAI...2 Mar 2022 — In this paper, the authors first overview many of the issues surro...

  8. Source: unece.org
    Link: https://unece.org/sites/default/files/2025-10/Companion%20Paper%20on%20Fairness%20in%20Machine%20Learning_Responsible%20AI%20Framework_ADSaMM%20group.pdf

  9. Source: hrlr.law.columbia.edu
    Title: reprogramming fairness affirmative action in algorithmic criminal sentencing
    Link: https://hrlr.law.columbia.edu/hrlr-online/reprogramming-fairness-affirmative-action-in-algorithmic-criminal-sentencing/
    Source snippet

    Action in Algorithmic Criminal Sentencing15 Apr 2020 — In 2016, the investigative journalism non-profit, ProPublica, alleged that a popul...

  10. Source: youtube.com
    Link: https://www.youtube.com/watch?v=VMBE2mgpxH8

Topic Tree

Follow this branch

Parent topic

Biased data When learned patterns become unfair

Related pages 2