Within Face Bias
The Study That Changed Face AI Testing
The Gender Shades study made facial-analysis bias visible by measuring errors across skin tone and gender together.
On this page
- What the researchers tested
- Why darker skinned women stood out
- How intersectional results changed the debate
Page outline Jump by section
Introduction
In 2018, the Gender Shades study transformed the discussion about fairness in artificial intelligence by showing that face-analysis systems did not make mistakes equally across all people. Rather than asking whether commercial systems were accurate on average, researchers Joy Buolamwini and Timnit Gebru examined how accuracy changed when both skin tone and gender were considered together. Their results revealed a striking pattern: darker-skinned women experienced far higher error rates than any other group. In some systems, the error rate for darker-skinned women reached 34.7%, while the highest error rate for lighter-skinned men was only 0.8%. The study became a landmark because it made a previously hidden problem measurable and impossible to ignore. [Proceedings of Machine Learning Research]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10758…
What the researchers tested
Before Gender Shades, many developers and companies reported overall accuracy scores for facial-analysis systems. Those aggregate figures could create the impression that a system worked well for everyone, even if performance varied dramatically between groups.
The researchers took a different approach. They created a benchmark designed to balance gender and skin type and then evaluated commercial gender-classification systems from major technology providers. Instead of reporting one overall score, they measured performance separately for four groups:
- Lighter-skinned men [dspace.mit.edu]dspace.mit.edu1026503582 MITIJB-A includes only 24.6% female and 4.4% darker female, and features 59.4% lighter…Read more… * Lighter-skinned women [news.mit.edu]news.mit.edustudy finds gender skin type bias artificial intelligence systems 0212MIT NewsStudy finds gender and skin-type bias in commercial…Feb 11, 2018 — Examination of facial-analysis software shows error rate of… * Darker-skinned men [thegradient.pub]thegradient.pubgender bias in aiThe GradientA Brief Overview of Gender Bias in AI8 Apr 2024 —… darker female faces (with error rates up to 34.7%). In contrast, the ma… * Darker-skinned women [news.mit.edu]news.mit.edustudy finds gender skin type bias artificial intelligence systems 0212MIT NewsStudy finds gender and skin-type bias in commercial…Feb 11, 2018 — Examination of facial-analysis software shows error rate of…
This method is known as an intersectional evaluation because it examines how multiple characteristics combine rather than treating them independently. The study showed that analysing gender alone or skin tone alone would have missed some of the most important disparities. [Proceedings of Machine Learning Research]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10758…
A crucial part of the research was its examination of the datasets used to evaluate face systems. The researchers found that widely used benchmarks were heavily skewed towards lighter-skinned subjects, with approximately 79.6% lighter-skinned individuals in IJB-A and 86.2% in Adience. Such imbalances created conditions in which poor performance on underrepresented groups could remain largely invisible. [Proceedings of Machine Learning Research+2DSpace]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10758…
Why darker-skinned women stood out
The most influential finding was not simply that errors existed, but that they clustered around a specific intersection of characteristics.
Across the tested systems, darker-skinned women were consistently the most misclassified group. Reported error rates ranged from roughly 20.8% to 34.7%, depending on the system. By contrast, lighter-skinned men experienced extremely low error rates, in some cases below 1%. [Proceedings of Machine Learning Research]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 11167…
This mattered because the disparity could not be explained by gender alone. Women generally experienced higher error rates than men, but darker-skinned women faced substantially larger problems than lighter-skinned women. Nor could the gap be explained by skin tone alone, because darker-skinned men generally performed better than darker-skinned women. The interaction between the two categories revealed a distinct pattern of disadvantage that became visible only when both were measured together. [Computer Science Classes]classes.cs.uchicago.eduDarker and Lighter Error Rates. To conduct a phenotypic performance analysisComputer Science ClassesGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · Cited by 11167 — The FPR for females i…
The study therefore challenged a common assumption in AI evaluation. A system could appear highly accurate overall while still performing poorly for a relatively small subgroup. If that subgroup was underrepresented in testing data, its experience would barely affect the headline accuracy figure. [Proceedings of Machine Learning Research]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10758…
How intersectional results changed the debate
The most lasting contribution of Gender Shades was methodological. The study shifted attention from average performance to subgroup performance.
Before the study, discussions about facial-analysis bias often focused on anecdotal failures. Gender Shades provided systematic evidence. The researchers showed that the issue was measurable, reproducible and connected to how datasets and benchmarks were constructed. This changed the conversation from isolated mistakes to structural evaluation problems. [Proceedings of Machine Learning Research]proceedings.mlr.pressThe substantial disparities in the accuracy of classifying darker females, lighter females, darker…
The study also demonstrated why intersectional testing matters. If researchers had only compared men and women, or only compared lighter and darker skin tones, they would have detected disparities but missed the full scale of the problem affecting darker-skinned women. The findings became one of the most widely cited examples of intersectional bias in machine learning and inspired later fairness audits across computer vision systems. [The Gradient]thegradient.pubgender bias in aiThe GradientA Brief Overview of Gender Bias in AI8 Apr 2024 —… darker female faces (with error rates up to 34.7%). In contrast, the ma…
Another important consequence was increased scrutiny of benchmark design. Gender Shades argued that fairness cannot be evaluated solely through overall accuracy scores. Evaluation datasets must contain sufficient representation from different groups, and results should be reported separately rather than hidden inside aggregate metrics. [Proceedings of Machine Learning Research]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10758…
Why the evidence remains important
The headline figure of 34.7% versus 0.8% became memorable because it illustrated how a seemingly successful AI system could produce very different experiences for different people. The study did not merely identify a technical flaw; it exposed a measurement blind spot. When benchmarks were dominated by lighter-skinned faces, systems could achieve impressive overall results while repeatedly failing darker-skinned women. [MIT News]news.mit.edustudy finds gender skin type bias artificial intelligence systems 0212MIT NewsStudy finds gender and skin-type bias in commercial…Feb 11, 2018 — Examination of facial-analysis software shows error rate of…
For understanding artificial intelligence, Gender Shades remains a foundational case because it showed that evaluating AI is not only about how often a model is correct. It is also about asking who is being measured, who is missing from the data, and whether performance is being examined across the groups most likely to reveal hidden weaknesses. The study made darker-skinned women’s error rates visible and, in doing so, changed expectations for how face AI systems should be tested and reported. Proceedings of Machine Learning Research+2Gender Shades [proceedings.mlr.press]proceedings.mlr.pressProceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in…by J Buolamwini · 2018 · Cited by 10758…
Endnotes
-
Source: news.mit.edu
Title: study finds gender skin type bias artificial intelligence systems 0212
Link: https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212Source snippet
MIT NewsStudy finds gender and skin-type bias in commercial...Feb 11, 2018 — Examination of facial-analysis software shows error rate of...
-
Source: dspace.mit.edu
Title: 1026503582 MIT
Link: https://dspace.mit.edu/bitstream/handle/1721.1/114068/1026503582-MIT.pdfSource snippet
IJB-A includes only 24.6% female and 4.4% darker female, and features 59.4% lighter...Read more...
-
Source: who.int
Link: https://www.who.int/health-topics/genderSource snippet
Gender and healthGender refers to the characteristics of women, men, girls and boys that are socially constructed. This includes norms, b...
-
Source: proceedings.mlr.press
Link: https://proceedings.mlr.press/v81/buolamwini18a.htmlSource snippet
Proceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in...by J Buolamwini · 2018 · Cited by 10758...
-
Source: proceedings.mlr.press
Link: https://proceedings.mlr.press/v81/buolamwini18aSource snippet
The substantial disparities in the accuracy of classifying darker females, lighter females, darker...
-
Source: proceedings.mlr.press
Link: https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdfSource snippet
Proceedings of Machine Learning ResearchGender Shades: Intersectional Accuracy Disparities in...by J Buolamwini · 2018 · Cited by 11167...
-
Source: classes.cs.uchicago.edu
Title: Darker and Lighter Error Rates. To conduct a phenotypic performance analysis
Link: https://www.classes.cs.uchicago.edu/archive/2020/winter/20370-1/readings/gendershadesAIbias.pdfSource snippet
Computer Science ClassesGender Shades: Intersectional Accuracy Disparities in...by J Buolamwini · Cited by 11167 — The FPR for females i...
-
Source: thegradient.pub
Title: gender bias in ai
Link: https://thegradient.pub/gender-bias-in-ai/Source snippet
The GradientA Brief Overview of Gender Bias in AI8 Apr 2024 —... darker female faces (with error rates up to 34.7%). In contrast, the ma...
-
Source: gendershades.org
Link: https://gendershades.org/Source snippet
Gender ShadesGender Shades. Home Results Research Paper Dataset. How well do IBM, Microsoft, and Face++ AI services guess the gender of a...
-
Source: studocu.com
Link: https://www.studocu.com/en-us/document/stanford-university/game-studies-issues-in-design-technology-and-player-creativity/gender-shades-intersectional-accuracy-disparities-in-ml-algorithms-mlr-2018/144319630Source snippet
t accuracy disparities based on gender and skin type.Read more...
-
Source: gendershades.org
Link: https://gendershades.org/overview.htmlSource snippet
Gender ShadesTable of subgroup error rates. IBM. IBM had the largest gap in accuracy, with a difference of 34.4% in error rate between li...
-
Source: ars.electronica.art
Link: https://ars.electronica.art/outofthebox/en/gender-shades/Source snippet
Gender Shades – Out of the BoxThe study reveals that popular applications that are already part of the programming display obvious discri...
-
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/GenderSource snippet
GenderGender is the range of social, psychological, cultural, and behavioral aspects of being a man (or boy), woman (or girl), or port...
-
Source: digitalgovernmenthub.org
Link: https://digitalgovernmenthub.org/library/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/Source snippet
7%), while lighter-skinned males have much lower error rates (as low as 0.8%)...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2505.20637Source snippet
Gender Shades study exemplified this, showing error rates of 34.7% for darker-skinned women versus 0.8% for lighter- skinned men in...Re...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=TWWsW1w-BVoSource snippet
Gender ShadesThe Gender Shades Project pilots an intersectional approach to inclusive product testing for AI. Gender Shades is a prelimin...
-
Source: scispace.com
Link: https://scispace.com/papers/gender-shades-intersectional-accuracy-disparities-in-4qgeu0c1i3Source snippet
while the most accurate result is for light-skinned men, in commercial...Read more...
-
Source: opencasebook.org
Link: https://opencasebook.org/casebooks/2554-governing-digital-technology/resources/5.1.2.2-gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification-by-joy-buolamwini-and-timnit-gebru-conference-of-fairness-accountability-and-transparency-2018/Source snippet
fication” by Joy Buolamwini and Timnit Gebru, Conference of Fairness...Read more...
-
Source: bibbase.org
Title: Buolamwini, J. & Gebru, T. Proceedings of Machine Learning Research.Read more
Link: https://bibbase.org/network/publication/buolamwini-gebru-gendershadesintersectionalaccuracydisparitiesincommercialgenderclassification-2018Source snippet
Gender Shades: Intersectional Accuracy Disparities in...Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classifi...
Additional References
-
Source: openaccess.thecvf.com
Link: https://openaccess.thecvf.com/content/WACV2023W/DVPBA/papers/Gbekevi_Analyzing_the_Impact_of_Gender_Misclassification_on_Face_Recognition_Accuracy_WACVW_2023_paper.pdfSource snippet
the Impact of Gender Misclassification on Face...by AEE Gbekevi · 2023 · Cited by 4 — The maximum er- ror rate for darker-skin-tone fema...
-
Source: just-tech.ssrc.org
Link: https://just-tech.ssrc.org/citation/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classi%EF%AC%81cation/Source snippet
Just TechGender Shades: Intersectional Accuracy Disparities in...We find that these datasets are overwhelmingly composed of lighter-skinn...
-
Source: merriam-webster.com
Link: https://www.merriam-webster.com/dictionary/genderSource snippet
GENDER Definition & MeaningThe meaning of GENDER is a subclass within a grammatical class (such as noun, pronoun, adjective, or verb) of...
-
Source: maquinacoes.rafaelg.net.br
Link: https://maquinacoes.rafaelg.net.br/gender-shadesSource snippet
The substantial disparities in the accuracy of classifying darker females, lighter females, darker...Read more...
-
Source: klover.ai
Title: dr timnit gebru translating gender shades into corporate governance
Link: https://www.klover.ai/dr-timnit-gebru-translating-gender-shades-into-corporate-governance/Source snippet
Timnit Gebru: Translating 'Gender Shades' into...23 Jun 2025 — Facial recognition systems failed darker-skinned women not because of mal...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/323722163_Gender_shades_intersectional_phenotypic_and_demographic_evaluation_of_face_datasets_and_gender_classifiersSource snippet
IJB-A includes only 24.6% female and 4.4% darker female, and features 59.4% lighter...Read more...
-
Source: medium.com
Title: Diversity, Equity, and Inclusion — A Human Factors Imperative
Link: https://medium.com/%40dennishenry/diversity-equity-and-inclusion-a-human-factors-imperative-for-better-outcomes-40267ea8fc7cSource snippet
error rate of 20–34% in identifying darker-skinned female faces, but error rates near 0% for lighter-skinned male faces (Buolamwini & Geb...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Gender-Shades%3A-Intersectional-Accuracy-Disparities-Buolamwini-Gebru/18858cc936947fc96b5c06bbe3c6c2faa5614540Source snippet
ghter females, darker males, and lighter males in gender classification systems...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2304.07175Source snippet
Exploring Causes of Demographic Variations In Face...by G Pangelinan · 2023 · Cited by 10 — males were only 0.8%, while error rates for...
-
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Gender-shades-%3A-intersectional-phenotypic-and-of-Buolamwini/a73bc5398c1ecf9ab8c755ad6af4d7e4774ca7ecSource snippet
mparing gender classification accuracies of females vs males and darker...
Topic Tree



