Why facial recognition errors are not evenly shared

Introduction

Facial recognition became one of the most widely discussed examples of AI bias because it revealed a crucial lesson: a system can appear highly accurate overall while still making substantially more mistakes for some groups of people than others. Early evaluations often reported a single accuracy figure, but later research showed that performance could vary across race, sex, age, and combinations of those characteristics. In practical terms, that means the risks of being wrongly identified, wrongly excluded, or subjected to additional scrutiny may not be shared equally across a population. Studies from researchers, independent auditors, and the US National Institute of Standards and Technology (NIST) helped turn facial recognition into a defining case study of how biased data and uneven learned patterns can emerge in AI systems. [NIST]nist.govstudy evaluates effects race age sex face recognition softwareNIST Study Evaluates Effects of Race, Age, Sex on Face…19 Dec 2019 — A new NIST study examines how accurately face recognition sof…

Face bias illustration 1

What demographic testing revealed

For many years, facial recognition systems were primarily evaluated using overall accuracy scores. Those averages often hid important differences between demographic groups. When researchers began testing systems separately by race, sex, and age, a more complex picture emerged.

One influential example was the 2018 Gender Shades study by Joy Buolamwini and Timnit Gebru. The researchers evaluated commercial face-analysis systems and found dramatic differences in error rates across groups. The highest-performing category was lighter-skinned men, with error rates below 1% in some systems, while darker-skinned women experienced error rates as high as 34.7%. The study became influential because it demonstrated that performance gaps could be extremely large even when vendors advertised high overall accuracy. Proceedings of Machine Learning Research+2Proceedings of Machine Learning Research [proceedings.mlr.press]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10693…

The findings were reinforced by larger evaluations. In 2019, NIST tested many facial recognition algorithms and reported that the majority exhibited what it called “demographic differentials” — measurable differences in performance across demographic groups. The results varied by algorithm, but many systems showed substantially higher error rates for certain populations. False-positive rates were often elevated for people of African and East Asian ancestry compared with some European-origin groups, while women, children, and older adults frequently experienced different error patterns than middle-aged men. [CSIS+3NIST+3NIST Publications]nist.govstudy evaluates effects race age sex face recognition softwareNIST Study Evaluates Effects of Race, Age, Sex on Face…19 Dec 2019 — A new NIST study examines how accurately face recognition sof…

Importantly, these disparities were not identical across all systems. Some algorithms performed far better than others, demonstrating that uneven outcomes were not an unavoidable property of facial recognition itself. Algorithm design, training data, and evaluation practices all influenced the results. [ASIS International+2Security Industry Association]asisonline.orgfacial recognition error rates vary by demographicASIS InternationalFacial Recognition Error Rates Vary by Demographic1 May 2020 — In the NIST study, not all algorithms gave these high ra…Published: May 2020

Why average accuracy can be misleading

A common misunderstanding is that a facial recognition system with 99% accuracy must work equally well for everyone. In reality, averages can conceal large differences between groups.

Imagine a system used on millions of people. If the overall error rate is low but one demographic group experiences ten times more false matches than another, the burden of mistakes becomes concentrated rather than evenly distributed. NIST’s demographic analysis found that false-positive rates in some algorithms differed by factors ranging from tenfold to more than one hundredfold across demographic groups. [PMC]pmc.ncbi.nlm.nih.govPMCBeating the bias in facial recognition technologyNIHby J Lunter · 2020 · Cited by 38 — NIST recently ran a large-scale test focused on identifying bias in FRT, with a particular em…

This is why researchers increasingly insist on reporting disaggregated results rather than a single headline accuracy figure. Looking only at overall performance can make a system appear fair even when particular groups face substantially higher risks. The lesson extends beyond facial recognition and applies broadly across AI systems: averages do not automatically reveal who bears the cost of mistakes. [ResearchGate]researchgate.netGender shades: intersectional phenotypic and…For example, Buolamwini (2017) found that facial recognition technology is m…

False matches and false non-matches as real harms

Understanding facial recognition bias requires distinguishing between two major categories of error.

Face bias illustration 2

False matches

A false match occurs when a system incorrectly decides that two images belong to the same person. In identification settings, this can cause an innocent individual to be linked to someone else.

Researchers and regulators pay particular attention to false matches because they can have serious consequences in policing, border control, security screening, and other identity-sensitive applications. NIST’s demographic testing found that many algorithms produced higher false-positive rates for some racial and ethnic groups than for others. In operational settings, this means members of certain groups may face a greater chance of being incorrectly flagged. [NIST Publications+2NIST Publications]nvlpubs.nist.govNIST PublicationsFace Recognition Vendor Test (FRVT), Part 3: Demographic…by P Grother · 2019 · Cited by 93 — False positives: Using t…

False non-matches

A false non-match occurs when a system fails to recognise that two images belong to the same person. This error can prevent legitimate access to services, devices, or secure locations.

Although false non-matches often receive less public attention than false matches, they can create unequal burdens. A traveller may be delayed at an automated border gate, or a user may repeatedly fail an identity verification process. If these failures occur disproportionately for particular demographic groups, the convenience promised by automation becomes unevenly distributed. [NIST]nist.govstudy evaluates effects race age sex face recognition softwareNIST Study Evaluates Effects of Race, Age, Sex on Face…19 Dec 2019 — A new NIST study examines how accurately face recognition sof…

The key point is that different applications make different errors more important. A phone-unlocking system and a police search system may both use facial recognition, but the social consequences of their mistakes are very different.

Why representation in image data matters

One major explanation for unequal error rates involves the data used to train and test AI systems.

Machine-learning models learn patterns from examples. If certain groups appear less frequently in training datasets, the model may have fewer opportunities to learn reliable representations of those faces. The Gender Shades researchers found that prominent face datasets used in the field contained disproportionately large numbers of lighter-skinned individuals, creating concerns about how well systems would generalise to more diverse populations. [Proceedings of Machine Learning Research]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10693…

Representation affects more than simple counts. Differences in image quality, lighting conditions, camera equipment, pose, age distribution, and collection practices can all influence model performance. Researchers have also shown that image quality and demographic composition interact in complex ways, meaning that performance gaps cannot always be explained by a single factor. [arXiv]arxiv.orgCharacterizing the Variability in Face Recognition Accuracy Relative to RaceApril 15, 2019…Published: April 15, 2019

As a result, improving fairness is not simply a matter of adding more images. Developers increasingly focus on collecting more representative datasets, testing systems across multiple demographic categories, and measuring performance separately for different groups before deployment. [ResearchGate]researchgate.netGender shades: intersectional phenotypic and…For example, Buolamwini (2017) found that facial recognition technology is m…

Face bias illustration 3

Why this case became a landmark example of AI bias

Facial recognition attracted unusual attention because the evidence was measurable and concrete. Researchers could compare error rates across groups, identify disparities, and independently test commercial systems. The findings transformed public discussions about AI fairness from abstract concerns into observable performance differences. [MIT News]news.mit.edustudy finds gender skin type bias artificial intelligence systems 0212MIT NewsStudy finds gender and skin-type bias in commercial…11 Feb 2018 — For darker-skinned women — those assigned scores of IV, V, o…

The debate also demonstrated that bias is not always visible in headline performance numbers. A system can perform well overall while imposing greater risks on particular populations. That insight has influenced how researchers evaluate many other forms of AI, encouraging demographic testing, subgroup analysis, and fairness audits as standard parts of system assessment. Facial recognition therefore became more than a controversy about one technology; it became a widely cited example of how biased data and uneven learned patterns can produce unequal outcomes in real-world AI systems. [NIST+2PMC]nist.govstudy evaluates effects race age sex face recognition softwareNIST Study Evaluates Effects of Race, Age, Sex on Face…19 Dec 2019 — A new NIST study examines how accurately face recognition sof…

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

WARNING MAY SPONTANEOUSLY START TALKING ABOUT DATA SCIENCE T-SHIRT

Search eBay.co.uk: data science t shirt

Browse similar on eBay.co.uk

Example eBay listing

Trust The Process Algorithmic Data Science Design T-Shirt

Search eBay.co.uk: data science t shirt

Browse similar on eBay.co.uk

Example eBay listing

Data Drives Decisions Mens T-Shirt Data Science Technology Fathers Day Gift

Search eBay.co.uk: data science t shirt

Browse similar on eBay.co.uk

Example eBay listing

Data Encoder I Love Statistics Data Science Data Analysts T-Shirt

Search eBay.co.uk: data science t shirt

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: nist.gov
Title: study evaluates effects race age sex face recognition software
Link: https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-face-recognition-software
Source snippet
NIST Study Evaluates Effects of Race, Age, Sex on Face...19 Dec 2019 — A new NIST study examines how accurately face recognition sof...
Source: news.mit.edu
Title: study finds gender skin type bias artificial intelligence systems 0212
Link: https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212
Source snippet
MIT NewsStudy finds gender and skin-type bias in commercial...11 Feb 2018 — For darker-skinned women — those assigned scores of IV, V, o...
Source: nvlpubs.nist.gov
Link: https://nvlpubs.nist.gov/nistpubs/ir/2019/nist.ir.8280.pdf
Source snippet
NIST PublicationsFace Recognition Vendor Test (FRVT), Part 3: Demographic...by P Grother · 2019 · Cited by 93 — [False positives]({{ 'false-positives/' | relative_url }}): Using t...
Source: csis.org
Title: problem bias facial recognition
Link: https://www.csis.org/blogs/strategic-technologies-blog/problem-bias-facial-recognition
Source snippet
The Problem of Bias in Facial Recognition1 May 2020 — NIST found that Asians, African Americans, and American Indians generally had highe...

Published: May 2020
Source: pmc.ncbi.nlm.nih.gov
Title: PMCBeating the bias in facial recognition technology
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7575263/
Source snippet
NIHby J Lunter · 2020 · Cited by 38 — NIST recently ran a large-scale test focused on identifying bias in FRT, with a particular em...
Source: researchgate.net
Link: https://www.researchgate.net/publication/323722163_Gender_shades_intersectional_phenotypic_and_demographic_evaluation_of_face_datasets_and_gender_classifiers
Source snippet
Gender shades: intersectional phenotypic and...For example, Buolamwini (2017) found that facial recognition technology is m...
Source: arxiv.org
Link: https://arxiv.org/abs/1904.07325
Source snippet
Characterizing the Variability in Face Recognition Accuracy Relative to RaceApril 15, 2019...

Published: April 15, 2019
Source: researchgate.net
Link: https://www.researchgate.net/publication/224238108_Demographic_effects_on_estimates_of_automatic_face_recognition_performance
Source snippet
Demographic effects on estimates of automatic face...Specifically, these studies suggested that face recognition is less acc...
Source: nist.gov
Link: https://www.nist.gov/
Source snippet
National Institute of Standards and TechnologyNIST promotes U.S. innovation and industrial competitiveness by advancing measurement scien...
Source: pages.nist.gov
Title: frvt demographics
Link: https://pages.nist.gov/frvt/html/frvt_demographics.html
Source snippet
False positives can in principle occur...Read more...
Source: arxiv.org
Link: https://arxiv.org/html/2502.02309v1
Source snippet
Review of Demographic Bias in Face Recognition4 Feb 2025 — The Face Recognition Vendor Test (FRVT) conducted by NIST [15] substantiated t...
Source: researchgate.net
Title: 388685657 Review of Demographic Bias in Face Recognition
Link: https://www.researchgate.net/publication/388685657_Review_of_Demographic_Bias_in_Face_Recognition
Source snippet
(PDF) Review of Demographic Bias in Face Recognition4 Feb 2025 — Demographic bias in face recognition (FR) has emerged as a critical area...
Source: proceedings.mlr.press
Link: https://proceedings.mlr.press/v81/buolamwini18a.html
Source snippet
Proceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in...by J Buolamwini · 2018 · Cited by 10693...
Source: proceedings.mlr.press
Title: Darker females have the highest error rates for all gender.Read more
Link: https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
Source snippet
Proceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in...by J Buolamwini · 2018 · Cited by 10693...
Source: asisonline.org
Title: facial recognition error rates vary by demographic
Link: https://www.asisonline.org/security-management-magazine/articles/2020/05/facial-recognition-error-rates-vary-by-demographic/
Source snippet
ASIS InternationalFacial Recognition Error Rates Vary by Demographic1 May 2020 — In the NIST study, not all algorithms gave these high ra...

Published: May 2020
Source: securityindustry.org
Title: what nist data shows about facial recognition and demographics
Link: https://www.securityindustry.org/report/what-nist-data-shows-about-facial-recognition-and-demographics/
Source snippet
Security Industry AssociationWhat NIST Data Shows About Facial Recognition and...6 Feb 2020 — The report specifically identifies six sup...
Source: Wikipedia
Title: National Institute of Standards and Technology
Link: https://en.wikipedia.org/wiki/National_Institute_of_Standards_and_Technology
Source snippet
National Institute of Standards and TechnologyThe National Institute of Standards and Technology (NIST) is an agency of the United Sta...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7879975/
Source snippet
This is the only study...

Additional References

Source: cognitec.com
Link: https://www.cognitec.com/files/tao/downloads/Cognitec-White-Paper-Demographic-Effects.pdf
Source snippet
Bias in Face Recognition Systems: Controversial Opinions...NIST used algorithms submitted to 1:1 and 1:N FRVTs for a specific study on d...
Source: linkedin.com
Link: https://www.linkedin.com/posts/justine-juillard_femalefounder-activity-7365451042727133184-gvD_
Source snippet
How Joy Buolamwini fought for fair facial recognitionThe systems could identify lighter-skinned men with 99.2% accuracy. But for darker-s...
Source: darktrace.com
Link: https://www.darktrace.com/cyber-ai-glossary/national-institute-of-standards-and-technology-nist
Source snippet
What is NIST? | Definition & ExamplesThe National Institute of Standards and Technology (NIST) is the federal technology agency that deve...
Source: cs4fn.blog
Link: https://cs4fn.blog/2022/11/01/recognising-and-addressing-bias-in-facial-recognition-tech-the-gender-shades-audit-blackhistorymonth-jb/
Source snippet
Recognising (and addressing) bias in facial recognition tech1 Nov 2022 — A 2018 study found that facial recognition systems were ess able...
Source: gendershades.org
Link: https://gendershades.org/overview.html
Source: proofpoint.com
Link: https://www.proofpoint.com/uk/threat-reference/nist-compliance
Source: rrapp.spia.princeton.edu
Link: https://rrapp.spia.princeton.edu/algorithmic-bias-in-facial-recognition-technology-on-the-basis-of-gender-and-skin-tone/
Source snippet
13 Oct 2020 — Researchers identify discrepancies in classification of gender and skin tone by facial recognition technology indicati...
Source: studocu.vn
Title: nistir 8280 frvt part 3 analyzing demographic effects in face recognition
Link: https://www.studocu.vn/vn/document/truong-dai-hoc-kinh-te-luat-dai-hoc-quoc-gia-thanh-pho-ho-chi-minh/cong-nghe-tien-dien-tu/nistir-8280-frvt-part-3-analyzing-demographic-effects-in-face-recognition/155963992
Source snippet
NISTIR 8280 FRVT Part 3: Analyzing Demographic Effects...This report evaluates the demographic effects on face recognition algorithms, h...
Source: youtube.com
Link: https://www.youtube.com/watch?v=TWWsW1w-BVo
Source snippet
Gender ShadesThe Gender Shades Project pilots an intersectional approach to inclusive product testing for AI. Gender Shades is a prelimin...
Source: itif.org
Title: critics were wrong nist data shows best facial recognition algorithms
Link: https://itif.org/publications/2020/01/27/critics-were-wrong-nist-data-shows-best-facial-recognition-algorithms/
Source snippet
The Critics Were Wrong: NIST Data Shows the Best Facial...by M McLaughlin · 2020 · Cited by 22 — In comparison to the false-negative rat...

Why facial recognition errors are not evenly shared

Introduction

What demographic testing revealed

Why average accuracy can be misleading

False matches and false non-matches as real harms

False matches

False non-matches

Why representation in image data matters

Why this case became a landmark example of AI bias

Further Reading

Unmasking AI

Atlas of AI

Weapons of Math Destruction

Race After Technology

Marketplace Samples

WARNING MAY SPONTANEOUSLY START TALKING ABOUT DATA SCIENCE T-SHIRT

Trust The Process Algorithmic Data Science Design T-Shirt

Data Drives Decisions Mens T-Shirt Data Science Technology Fathers Day Gift

Data Encoder I Love Statistics Data Science Data Analysts T-Shirt

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2