Why some voices confuse AI systems

Introduction

Speech-recognition systems learn from recordings of human speech. As a result, the voices included in training data strongly influence which voices the system understands well. When certain accents are common in the dataset and others are rare, the system often becomes more accurate for the well-represented groups and less reliable for everyone else. This is one of the clearest examples of how training data shapes what an artificial intelligence system learns.

Accents illustration 1 The issue is not that AI systems inherently dislike particular accents. Rather, machine-learning models learn statistical patterns from examples. If a model hears millions of examples of some pronunciation patterns but only a handful of others, it will usually become better at recognising the patterns it encounters most often. The result can be uneven performance across populations, exposing gaps in the data used to train the system. [arXiv]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

What representation means in speech data

For a speech-recognition system, representation means more than simply including many speakers. The training data must reflect the diversity of voices the system will encounter in the real world.

Accents differ in pronunciation, vowel quality, rhythm, stress patterns, and sometimes vocabulary. A speaker from Glasgow, Newcastle, Lagos, Mumbai, Texas, or Auckland may pronounce the same sentence in noticeably different ways. To a human listener these differences are often easy to understand because people have broad experience with linguistic variation. A machine-learning model can only learn variation that appears in its training examples.

This creates a practical challenge. Large speech datasets have historically been easier to collect from some populations than others. Speakers from dominant language groups, major urban centres, or regions with stronger technology industries are often overrepresented. Less common regional accents, minority dialects, and under-documented languages may appear far less frequently. [ACL Anthology]aclanthology.org2020.lrec 1.520ACL AnthologyCommon Voice: A Massively-Multilingual Speech Corpusby R Ardila · 2020 · Cited by 3040 — The Common Voice corpus is a massiv…

Researchers studying speech technology have repeatedly found that lack of dialectal and accent diversity in training corpora contributes to uneven performance. A 2021 analysis of a state-of-the-art speech-recognition system found that accuracy varied across English accents and tended to favour accents that were more prevalent in the training corpus. [arXiv]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

How missing accents become higher error rates

The connection between representation and performance is usually straightforward. Speech-recognition models learn relationships between sounds and words. When an accent uses sound patterns that appear infrequently during training, the model has less opportunity to learn them correctly.

Imagine a system trained primarily on speakers whose pronunciation of certain vowels follows one pattern. If a different accent uses those vowels differently, the model may misidentify words because it interprets the sounds through the lens of what it has already learned.

Researchers have documented this effect across multiple speech-recognition systems. One influential study examining commercial automatic speech-recognition services found substantial disparities in transcription accuracy. Across five major systems, average word error rates were about 35% for Black speakers compared with 19% for white speakers. The researchers concluded that the gap was linked to the underlying acoustic models used by the systems rather than differences in the content of what people said. [PubMed+2PMC]pubmed.ncbi.nlm.nih.govRead moreRacial Disparities in Automated Speech Recognitionby A Koenecke · 2020 · Cited by 1155 — We found that all five ASR systems exhibit…

Subsequent work has reinforced the idea that pronunciation patterns and dialectal variation are major drivers of these errors. Studies examining sociophonetic features—the fine-grained sound characteristics of speech—have found that variation in vowel production and other accent-related features can systematically increase recognition errors when those patterns are insufficiently represented during training. [arXiv]arxiv.orgA Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English CorpusOctober 26, 2025…Published: October 26, 2025

The problem is not limited to one dialect or country. Researchers have reported performance differences across national, regional, and ethnic varieties of English, showing that accuracy often declines as speech diverges from the varieties most common in training datasets. [arXiv+2ResearchGate]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

Accents illustration 2

Why these errors matter beyond convenience

At first glance, an accent-related transcription mistake may seem like a minor inconvenience. However, speech recognition increasingly sits inside services that people depend on every day.

Voice assistants, automated customer-service systems, captioning tools, accessibility software, education platforms, and healthcare documentation systems all rely on accurate speech recognition. When some groups experience significantly higher error rates, they face a larger burden when using those tools.

Research examining the human impact of speech-recognition failures found that repeated transcription errors can leave users feeling excluded and can encourage them to alter their speech in order to be understood by the technology. Participants reported feeling that the system was not designed with people like them in mind. [PMC]pmc.ncbi.nlm.nih.govImpact of Automated Speech Recognition Errors on African…by Z Mengesha · 2021 · Cited by 170 — The results demonstrate that ASR fai…

This highlights an important point about training data gaps. They are not merely technical shortcomings. They can affect who benefits most from a technology and who must expend additional effort to use it successfully.

What better coverage changes in practice

Improving representation in speech datasets can substantially reduce these problems. The goal is not simply to gather more recordings, but to collect recordings that reflect the full diversity of real-world speech.

Several approaches are commonly used:

Expanding speaker diversity by including more regional, social, and ethnic varieties of speech.
Collecting data from underrepresented communities rather than relying primarily on readily available sources.
Measuring performance separately across accents so that overall accuracy does not hide disparities.
Building specialised evaluation datasets that test systems on a broad range of accents and dialects.
Creating open speech resources that allow researchers and developers to improve coverage collectively. [Holistic AI+2ACL Anthology]holisticai.comThe selected datasets are well documented.Read moreHolistic AIInsightful Resources for Uncovering Bias in English…Jan 27, 2023 — In this blog, we show you some great data sets to consid…

Projects such as Mozilla’s Common Voice were created partly in response to the concentration of speech data among a limited set of languages and speaker groups. By crowdsourcing recordings from many communities, such projects seek to increase the diversity of speech data available for AI development. [Common Voice]commonvoice.mozilla.orgCommon Voice Mozilla Common VoiceCommon Voice Mozilla Common Voice

Recent research continues to show that evaluating systems across different accents reveals weaknesses that might otherwise remain hidden. Studies of minority dialects and regional varieties consistently find that speech technologies perform more equitably when developers pay explicit attention to linguistic diversity during data collection and testing. [Georgia Tech Research+2arXiv]research.gatech.eduminority english dialects vulnerable automatic speech recognition inaccuracyGeorgia Tech ResearchMinority English Dialects Vulnerable to Automatic Speech…Nov 15, 2024 — While the models transcribed SAE-speaking…

Accents illustration 3

What accents reveal about AI learning

Underrepresented accents provide a particularly clear demonstration of a broader principle in artificial intelligence: models learn from the examples they see, not from an abstract understanding of the world.

When an AI system struggles with certain accents, the failure often points to a missing part of its experience rather than a flaw in the speakers themselves. The accent acts as a diagnostic tool, revealing where the training data does not adequately represent the population the system is expected to serve.

For this reason, accent-related performance gaps are valuable evidence when evaluating AI systems. They show how choices about data collection can become visible in real-world behaviour and why representative training data is essential for building systems that work reliably across diverse groups of people. [arXiv+2PMC]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

eBay

Example eBay listing

Not with a Bug, But with a Sticker – Attacks on Machine Learning Systems and Wh…

Search eBay.co.uk: machine learning sticker

Browse similar on eBay.co.uk

Example eBay listing

Not with a Bug, but with a Sticker: Attacks on Machine Learning Systems and What

Search eBay.co.uk: machine learning sticker

Browse similar on eBay.co.uk

Example eBay listing

Not with a Bug, but with a Sticker : Attacks on Machine Learning Systems and ...

Search eBay.co.uk: machine learning sticker

Browse similar on eBay.co.uk

Example eBay listing

MACHINE LEARNING MODEL SMALL STICKER DECAL SCHOOL COLLEGE TEACH TEACHING

Search eBay.co.uk: machine learning sticker

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Link: https://arxiv.org/abs/2105.05041
Source snippet
English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021...

Published: May 9, 2021
Source: researchgate.net
Link: https://www.researchgate.net/publication/362430600_Performance_Disparities_Between_Accents_in_Automatic_Speech_Recognition
Source snippet
Performance Disparities Between Accents in Automatic...Aug 1, 2022 — Researchers have identified biases in ASR performance between parti...
Source: commonvoice.mozilla.org
Title: Common Voice Mozilla Common Voice
Link: https://commonvoice.mozilla.org/
Source: pmc.ncbi.nlm.nih.gov
Title: Read more
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7149386/
Source snippet
Racial disparities in [automated]({{ 'decisions/' | relative_url }}) speech recognition - PMC - NIHby A Koenecke · 2020 · Cited by 1147 — We found that all five ASR system...
Source: arxiv.org
Link: https://arxiv.org/abs/2510.22495
Source snippet
A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English CorpusOctober 26, 2025...

Published: October 26, 2025
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC8664002/
Source snippet
Impact of Automated Speech Recognition Errors on African...by Z Mengesha · 2021 · Cited by 170 — The results demonstrate that ASR fai...
Source: holisticai.com
Title: The selected datasets are well documented.Read more
Link: https://www.holisticai.com/blog/uncovering-bias-english-speech-recognition
Source snippet
Holistic AIInsightful Resources for Uncovering Bias in English...Jan 27, 2023 — In this blog, we show you some great data sets to consid...
Source: arxiv.org
Link: https://arxiv.org/abs/2603.24549
Source: arxiv.org
Link: https://arxiv.org/html/2508.07143v1
Source snippet
Fairness of Automatic Speech Recognition10 Aug 2025 — When ASR misrecognizes speech, the burden of correction falls disproportionately on...
Source: arxiv.org
Link: https://arxiv.org/html/2510.22495v1
Source snippet
(2020) found that commercial ASR systems exhibit average word error rates of 35% for African American speakers compared to 19%...Read more...
Source: arxiv.org
Link: https://arxiv.org/abs/1912.06670
Source snippet
Common Voice: A Massively-[Multilingual]({{ 'language-bias/' | relative_url }}) Speech Corpusby R Ardila · 2019 · Cited by 3040 — Abstract:The Common Voice corpus is a massively...
Source: ar5iv.labs.arxiv.org
Link: https://ar5iv.labs.arxiv.org/html/1912.06670
Source snippet
The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and developm...
Source: researchgate.net
Link: https://www.researchgate.net/publication/319185540_Effects_of_Talker_Dialect_Gender_Race_on_Accuracy_of_Bing_Speech_and_YouTube_Automatic_Captions
Source snippet
ms consistently show 5-10% higher error rates for Black speakers versus white...Read more...
Source: researchgate.net
Link: https://www.researchgate.net/publication/376941218_Accents_in_Speech_Recognition_through_the_Lens_of_a_World_Englishes_Evaluation_Set
Source snippet
(PDF) Accents in Speech Recognition through the Lens of...Automatic Speech Recognition (ASR) systems generalize poorly on accented speec...
Source: mozilla.org
Link: https://www.mozilla.org/
Source snippet
Internet for people, not profit — Mozilla GlobalMozilla is the not-for-profit behind the lightning fast Firefox browser. We put people ov...
Source: mozilla.org
Link: https://www.mozilla.org/en-US/
Source snippet
Internet for people, not profit (US)Jun 30, 2025 — Firefox: Get the gold standard for browsing with [speed]({{ 'speed/' | relative_url }}), privacy and control...
Source: discourse.mozilla.org
Title: dialects or language varaiants
Link: https://discourse.mozilla.org/t/dialects-or-language-varaiants/138089
Source snippet
or Language Varaiants - Common Voice28 Dec 2024 — By actively involving speakers from all Burushaski dialects, you can help ensure that t...
Source: commonvoice.mozilla.org
Link: https://commonvoice.mozilla.org/en
Source snippet
Voice - MozillaCommon Voice is a free, open source platform for community-led data creation. Anyone can preserve, revitalise and elevate...
Source: aclanthology.org
Title: 2020.lrec 1.520
Link: https://aclanthology.org/2020.lrec-1.520/
Source snippet
ACL AnthologyCommon Voice: A Massively-Multilingual Speech Corpusby R Ardila · 2020 · Cited by 3040 — The Common Voice corpus is a massiv...
Source: pubmed.ncbi.nlm.nih.gov
Title: Read more
Link: https://pubmed.ncbi.nlm.nih.gov/32205437/
Source snippet
Racial Disparities in Automated Speech Recognitionby A Koenecke · 2020 · Cited by 1155 — We found that all five ASR systems exhibit...
Source: gatech.edu
Title: minority english dialects vulnerable automatic speech recognition inaccuracy
Link: https://www.gatech.edu/news/2024/11/15/minority-english-dialects-vulnerable-automatic-speech-recognition-inaccuracy
Source snippet
Georgia Institute of TechnologyMinority English Dialects Vulnerable to Automatic Speech...Nov 15, 2024 — The Automatic Speech Recognitio...
Source: research.gatech.edu
Title: minority english dialects vulnerable automatic speech recognition inaccuracy
Link: https://research.gatech.edu/minority-english-dialects-vulnerable-automatic-speech-recognition-inaccuracy
Source snippet
Georgia Tech ResearchMinority English Dialects Vulnerable to Automatic Speech...Nov 15, 2024 — While the models transcribed SAE-speaking...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12371062/
Source snippet
by SC Santos · 2025 · Cited by 1 — Research using AI voice cloning supports the idea that we may be biased to falsely discriminate the...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC11631515/
Source snippet
automatic speech recognition system performance...by M Zolnoori · 2024 · Cited by 23 — Further studies by Tatman and Wassink et al highl...
Source: datacollective.mozillafoundation.org
Link: https://datacollective.mozillafoundation.org/datasets
Source snippet
Mozilla Data CollectiveThe dataset comprises standardized lexical entries covering core vocabulary, function words, and culturally salien...
Source: juliadiez.substack.com
Title: mozilla common voice democratizing
Link: https://juliadiez.substack.com/p/mozilla-common-voice-democratizing
Source snippet
Common Voice: Democratizing Speech Technology...Mozilla Common Voice is the world's most inclusive, open-source speech dataset—engaging...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Mozilla
Source snippet
MozillaMozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes an...
Source: easychair.org
Link: https://easychair.org/publications/preprint/gFLz
Source snippet
Common Voice and Accent Choice: Data Contributors Self-...7 Feb 2023 — Datasets used as inputs for training [speech models]({{ 'failure-modes/' | relative_url }}) often represen...
Source: kaggle.com
Link: https://www.kaggle.com/datasets/mozillaorg/common-voice
Source snippet
Common VoiceCommon Voice is a corpus of speech data read by users on the Common Voice website ([http://voice.mozilla.org/](http://voice.mozilla.org/)), and based upon...

Additional References

Source: nist.gov
Link: https://www.nist.gov/publications/openasr20-open-challenge-automatic-speech-recognition-ofconversational-telephone-speech
Source snippet
OpenASR20: An Open Challenge for Automatic Speech...by K Peterson · 2021 · Cited by 11 — The results show overall high word error rate (...
Source: firefox.com
Link: https://www.firefox.com/en-US/download/all/
Source snippet
Choose which Firefox Browser to download in your languageChoose which Firefox Browser to download in your language. Everyone deserves acc...
Source: mozilladatacollective.com
Link: https://mozilladatacollective.com/datasets
Source snippet
DatasetsTupuri (tui) scripted speech dataset: 1,800 clips (≈2h23min) from 16 speakers across two dialect subsets — Bango and Banwere with...
Source: fairspeech.stanford.edu
Link: https://fairspeech.stanford.edu/
Source snippet
The Race Gap in Speech Recognition TechnologyWe found that all five services showed significant racial disparities. Average err...
Source: firefox.com
Link: https://www.firefox.com/
Source snippet
Get Firefox for desktop and mobile — Firefox.comFirefox is a free web browser backed by Mozilla, a non-profit dedicated to internet healt...
Source: facebook.com
Link: https://www.facebook.com/HowardU/posts/a-lack-of-african-american-english-data-within-automated-speech-recognition-syst/1059645339526617/
Source snippet
Howard UniversityA lack of African American English data within automated speech recognition systems means Black users experience signifi...
Source: nist.gov
Title: study evaluates effects race age sex face recognition software
Link: https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-face-recognition-software
Source snippet
NIST Study Evaluates Effects of Race, Age, Sex on Face...Dec 19, 2019 — A new NIST study examines how accurately face recognition softwa...
Source: idtechwire.com
Title: study finds racial bias in leading speech recognition systems 903246
Link: https://idtechwire.com/study-finds-racial-bias-in-leading-speech-recognition-systems-903246/
Source snippet
Study Finds Racial Bias in Leading Speech Recognition...Mar 24, 2020 — The researchers speculate that the problem arises from the use of...
Source: nhsjs.com
Link: https://nhsjs.com/2025/evaluating-the-accessibility-of-automatic-speech-recognition-technology-across-accents/
Source snippet
y in ASR systems' ability to accurately transcribe content from diverse accents...Read more...
Source: kerson.ai
Link: https://kerson.ai/research/accent-bias-in-speech-recognition-challenges-impacts-and-solutions/
Source snippet
osoft, Apple) found nearly double the error rate for African American speakers...Read more...

Why some voices confuse AI systems

Introduction

What representation means in speech data

How missing accents become higher error rates

Why these errors matter beyond convenience

What better coverage changes in practice

What accents reveal about AI learning

Further Reading

Hands-on Machine Learning with Scikit-Learn, Keras, and Tenso...

Artificial Intelligence

Data Science for Business

Weapons of Math Destruction

Marketplace Samples

Not with a Bug, But with a Sticker – Attacks on Machine Learning Systems and Wh…

Not with a Bug, but with a Sticker: Attacks on Machine Learning Systems and What

Not with a Bug, but with a Sticker : Attacks on Machine Learning Systems and ...

MACHINE LEARNING MODEL SMALL STICKER DECAL SCHOOL COLLEGE TEACH TEACHING

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2