Within Training data

Why some voices confuse AI systems

Speech recognition can work unevenly when training recordings do not reflect the full range of voices it will encounter.

On this page

  • What representation means in speech data
  • How missing accents become higher error rates
  • What better coverage changes in practice
Preview for Why some voices confuse AI systems

Introduction

Speech-recognition systems learn from recordings of human speech. As a result, the voices included in training data strongly influence which voices the system understands well. When certain accents are common in the dataset and others are rare, the system often becomes more accurate for the well-represented groups and less reliable for everyone else. This is one of the clearest examples of how training data shapes what an artificial intelligence system learns.

Accents illustration 1 The issue is not that AI systems inherently dislike particular accents. Rather, machine-learning models learn statistical patterns from examples. If a model hears millions of examples of some pronunciation patterns but only a handful of others, it will usually become better at recognising the patterns it encounters most often. The result can be uneven performance across populations, exposing gaps in the data used to train the system. [arXiv]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

What representation means in speech data

For a speech-recognition system, representation means more than simply including many speakers. The training data must reflect the diversity of voices the system will encounter in the real world.

Accents differ in pronunciation, vowel quality, rhythm, stress patterns, and sometimes vocabulary. A speaker from Glasgow, Newcastle, Lagos, Mumbai, Texas, or Auckland may pronounce the same sentence in noticeably different ways. To a human listener these differences are often easy to understand because people have broad experience with linguistic variation. A machine-learning model can only learn variation that appears in its training examples.

This creates a practical challenge. Large speech datasets have historically been easier to collect from some populations than others. Speakers from dominant language groups, major urban centres, or regions with stronger technology industries are often overrepresented. Less common regional accents, minority dialects, and under-documented languages may appear far less frequently. [ACL Anthology]aclanthology.org2020.lrec 1.520ACL AnthologyCommon Voice: A Massively-Multilingual Speech Corpusby R Ardila · 2020 · Cited by 3040 — The Common Voice corpus is a massiv…

Researchers studying speech technology have repeatedly found that lack of dialectal and accent diversity in training corpora contributes to uneven performance. A 2021 analysis of a state-of-the-art speech-recognition system found that accuracy varied across English accents and tended to favour accents that were more prevalent in the training corpus. [arXiv]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

How missing accents become higher error rates

The connection between representation and performance is usually straightforward. Speech-recognition models learn relationships between sounds and words. When an accent uses sound patterns that appear infrequently during training, the model has less opportunity to learn them correctly.

Imagine a system trained primarily on speakers whose pronunciation of certain vowels follows one pattern. If a different accent uses those vowels differently, the model may misidentify words because it interprets the sounds through the lens of what it has already learned.

Researchers have documented this effect across multiple speech-recognition systems. One influential study examining commercial automatic speech-recognition services found substantial disparities in transcription accuracy. Across five major systems, average word error rates were about 35% for Black speakers compared with 19% for white speakers. The researchers concluded that the gap was linked to the underlying acoustic models used by the systems rather than differences in the content of what people said. [PubMed+2PMC]pubmed.ncbi.nlm.nih.govRead moreRacial Disparities in Automated Speech Recognitionby A Koenecke · 2020 · Cited by 1155 — We found that all five ASR systems exhibit…

Subsequent work has reinforced the idea that pronunciation patterns and dialectal variation are major drivers of these errors. Studies examining sociophonetic features—the fine-grained sound characteristics of speech—have found that variation in vowel production and other accent-related features can systematically increase recognition errors when those patterns are insufficiently represented during training. [arXiv]arxiv.orgA Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English CorpusOctober 26, 2025…Published: October 26, 2025

The problem is not limited to one dialect or country. Researchers have reported performance differences across national, regional, and ethnic varieties of English, showing that accuracy often declines as speech diverges from the varieties most common in training datasets. [arXiv+2ResearchGate]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

Accents illustration 2

Why these errors matter beyond convenience

At first glance, an accent-related transcription mistake may seem like a minor inconvenience. However, speech recognition increasingly sits inside services that people depend on every day.

Voice assistants, automated customer-service systems, captioning tools, accessibility software, education platforms, and healthcare documentation systems all rely on accurate speech recognition. When some groups experience significantly higher error rates, they face a larger burden when using those tools.

Research examining the human impact of speech-recognition failures found that repeated transcription errors can leave users feeling excluded and can encourage them to alter their speech in order to be understood by the technology. Participants reported feeling that the system was not designed with people like them in mind. [PMC]pmc.ncbi.nlm.nih.govImpact of Automated Speech Recognition Errors on African…by Z Mengesha · 2021 · Cited by 170 — The results demonstrate that ASR fai…

This highlights an important point about training data gaps. They are not merely technical shortcomings. They can affect who benefits most from a technology and who must expend additional effort to use it successfully.

What better coverage changes in practice

Improving representation in speech datasets can substantially reduce these problems. The goal is not simply to gather more recordings, but to collect recordings that reflect the full diversity of real-world speech.

Several approaches are commonly used:

  • Expanding speaker diversity by including more regional, social, and ethnic varieties of speech.
  • Collecting data from underrepresented communities rather than relying primarily on readily available sources.
  • Measuring performance separately across accents so that overall accuracy does not hide disparities.
  • Building specialised evaluation datasets that test systems on a broad range of accents and dialects.
  • Creating open speech resources that allow researchers and developers to improve coverage collectively. [Holistic AI+2ACL Anthology]holisticai.comThe selected datasets are well documented.Read moreHolistic AIInsightful Resources for Uncovering Bias in English…Jan 27, 2023 — In this blog, we show you some great data sets to consid…

Projects such as Mozilla’s Common Voice were created partly in response to the concentration of speech data among a limited set of languages and speaker groups. By crowdsourcing recordings from many communities, such projects seek to increase the diversity of speech data available for AI development. [Common Voice]commonvoice.mozilla.orgCommon Voice Mozilla Common VoiceCommon Voice Mozilla Common Voice

Recent research continues to show that evaluating systems across different accents reveals weaknesses that might otherwise remain hidden. Studies of minority dialects and regional varieties consistently find that speech technologies perform more equitably when developers pay explicit attention to linguistic diversity during data collection and testing. [Georgia Tech Research+2arXiv]research.gatech.eduminority english dialects vulnerable automatic speech recognition inaccuracyGeorgia Tech ResearchMinority English Dialects Vulnerable to Automatic Speech…Nov 15, 2024 — While the models transcribed SAE-speaking…

Accents illustration 3

What accents reveal about AI learning

Underrepresented accents provide a particularly clear demonstration of a broader principle in artificial intelligence: models learn from the examples they see, not from an abstract understanding of the world.

When an AI system struggles with certain accents, the failure often points to a missing part of its experience rather than a flaw in the speakers themselves. The accent acts as a diagnostic tool, revealing where the training data does not adequately represent the population the system is expected to serve.

For this reason, accent-related performance gaps are valuable evidence when evaluating AI systems. They show how choices about data collection can become visible in real-world behaviour and why representative training data is essential for building systems that work reliably across diverse groups of people. [arXiv+2PMC]arxiv.orgEnglish Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021…Published: May 9, 2021

Amazon book picks

Further Reading

Books and field guides related to Why some voices confuse AI systems. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Link: https://arxiv.org/abs/2105.05041
    Source snippet

    English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition SystemMay 9, 2021...

    Published: May 9, 2021

  2. Source: researchgate.net
    Link: https://www.researchgate.net/publication/362430600_Performance_Disparities_Between_Accents_in_Automatic_Speech_Recognition
    Source snippet

    Performance Disparities Between Accents in Automatic...Aug 1, 2022 — Researchers have identified biases in ASR performance between parti...

  3. Source: commonvoice.mozilla.org
    Title: Common Voice Mozilla Common Voice
    Link: https://commonvoice.mozilla.org/

  4. Source: pmc.ncbi.nlm.nih.gov
    Title: Read more
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7149386/
    Source snippet

    Racial disparities in [automated]({{ 'decisions/' | relative_url }}) speech recognition - PMC - NIHby A Koenecke · 2020 · Cited by 1147 — We found that all five ASR system...

  5. Source: arxiv.org
    Link: https://arxiv.org/abs/2510.22495
    Source snippet

    A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English CorpusOctober 26, 2025...

    Published: October 26, 2025

  6. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC8664002/
    Source snippet

    Impact of Automated Speech Recognition Errors on African...by Z Mengesha · 2021 · Cited by 170 — The results demonstrate that ASR fai...

  7. Source: holisticai.com
    Title: The selected datasets are well documented.Read more
    Link: https://www.holisticai.com/blog/uncovering-bias-english-speech-recognition
    Source snippet

    Holistic AIInsightful Resources for Uncovering Bias in English...Jan 27, 2023 — In this blog, we show you some great data sets to consid...

  8. Source: arxiv.org
    Link: https://arxiv.org/abs/2603.24549

  9. Source: arxiv.org
    Link: https://arxiv.org/html/2508.07143v1
    Source snippet

    Fairness of Automatic Speech Recognition10 Aug 2025 — When ASR misrecognizes speech, the burden of correction falls disproportionately on...

  10. Source: arxiv.org
    Link: https://arxiv.org/html/2510.22495v1
    Source snippet

    (2020) found that commercial ASR systems exhibit average word error rates of 35% for African American speakers compared to 19%...Read more...

  11. Source: arxiv.org
    Link: https://arxiv.org/abs/1912.06670
    Source snippet

    Common Voice: A Massively-[Multilingual]({{ 'language-bias/' | relative_url }}) Speech Corpusby R Ardila · 2019 · Cited by 3040 — Abstract:The Common Voice corpus is a massively...

  12. Source: ar5iv.labs.arxiv.org
    Link: https://ar5iv.labs.arxiv.org/html/1912.06670
    Source snippet

    The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and developm...

  13. Source: researchgate.net
    Link: https://www.researchgate.net/publication/319185540_Effects_of_Talker_Dialect_Gender_Race_on_Accuracy_of_Bing_Speech_and_YouTube_Automatic_Captions
    Source snippet

    ms consistently show 5-10% higher error rates for Black speakers versus white...Read more...

  14. Source: researchgate.net
    Link: https://www.researchgate.net/publication/376941218_Accents_in_Speech_Recognition_through_the_Lens_of_a_World_Englishes_Evaluation_Set
    Source snippet

    (PDF) Accents in Speech Recognition through the Lens of...Automatic Speech Recognition (ASR) systems generalize poorly on accented speec...

  15. Source: mozilla.org
    Link: https://www.mozilla.org/
    Source snippet

    Internet for people, not profit — Mozilla GlobalMozilla is the not-for-profit behind the lightning fast Firefox browser. We put people ov...

  16. Source: mozilla.org
    Link: https://www.mozilla.org/en-US/
    Source snippet

    Internet for people, not profit (US)Jun 30, 2025 — Firefox: Get the gold standard for browsing with [speed]({{ 'speed/' | relative_url }}), privacy and control...

  17. Source: discourse.mozilla.org
    Title: dialects or language varaiants
    Link: https://discourse.mozilla.org/t/dialects-or-language-varaiants/138089
    Source snippet

    or Language Varaiants - Common Voice28 Dec 2024 — By actively involving speakers from all Burushaski dialects, you can help ensure that t...

  18. Source: commonvoice.mozilla.org
    Link: https://commonvoice.mozilla.org/en
    Source snippet

    Voice - MozillaCommon Voice is a free, open source platform for community-led data creation. Anyone can preserve, revitalise and elevate...

  19. Source: aclanthology.org
    Title: 2020.lrec 1.520
    Link: https://aclanthology.org/2020.lrec-1.520/
    Source snippet

    ACL AnthologyCommon Voice: A Massively-Multilingual Speech Corpusby R Ardila · 2020 · Cited by 3040 — The Common Voice corpus is a massiv...

  20. Source: pubmed.ncbi.nlm.nih.gov
    Title: Read more
    Link: https://pubmed.ncbi.nlm.nih.gov/32205437/
    Source snippet

    Racial Disparities in Automated Speech Recognitionby A Koenecke · 2020 · Cited by 1155 — We found that all five ASR systems exhibit...

  21. Source: gatech.edu
    Title: minority english dialects vulnerable automatic speech recognition inaccuracy
    Link: https://www.gatech.edu/news/2024/11/15/minority-english-dialects-vulnerable-automatic-speech-recognition-inaccuracy
    Source snippet

    Georgia Institute of TechnologyMinority English Dialects Vulnerable to Automatic Speech...Nov 15, 2024 — The Automatic Speech Recognitio...

  22. Source: research.gatech.edu
    Title: minority english dialects vulnerable automatic speech recognition inaccuracy
    Link: https://research.gatech.edu/minority-english-dialects-vulnerable-automatic-speech-recognition-inaccuracy
    Source snippet

    Georgia Tech ResearchMinority English Dialects Vulnerable to Automatic Speech...Nov 15, 2024 — While the models transcribed SAE-speaking...

  23. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12371062/
    Source snippet

    by SC Santos · 2025 · Cited by 1 — Research using AI voice cloning supports the idea that we may be biased to falsely discriminate the...

  24. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC11631515/
    Source snippet

    automatic speech recognition system performance...by M Zolnoori · 2024 · Cited by 23 — Further studies by Tatman and Wassink et al highl...

  25. Source: datacollective.mozillafoundation.org
    Link: https://datacollective.mozillafoundation.org/datasets
    Source snippet

    Mozilla Data CollectiveThe dataset comprises standardized lexical entries covering core vocabulary, function words, and culturally salien...

  26. Source: juliadiez.substack.com
    Title: mozilla common voice democratizing
    Link: https://juliadiez.substack.com/p/mozilla-common-voice-democratizing
    Source snippet

    Common Voice: Democratizing Speech Technology...Mozilla Common Voice is the world's most inclusive, open-source speech dataset—engaging...

  27. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Mozilla
    Source snippet

    MozillaMozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes an...

  28. Source: easychair.org
    Link: https://easychair.org/publications/preprint/gFLz
    Source snippet

    Common Voice and Accent Choice: Data Contributors Self-...7 Feb 2023 — Datasets used as inputs for training [speech models]({{ 'failure-modes/' | relative_url }}) often represen...

  29. Source: kaggle.com
    Link: https://www.kaggle.com/datasets/mozillaorg/common-voice
    Source snippet

    Common VoiceCommon Voice is a corpus of speech data read by users on the Common Voice website ([http://voice.mozilla.org/](http://voice.mozilla.org/)), and based upon...

Additional References

  1. Source: nist.gov
    Link: https://www.nist.gov/publications/openasr20-open-challenge-automatic-speech-recognition-ofconversational-telephone-speech
    Source snippet

    OpenASR20: An Open Challenge for Automatic Speech...by K Peterson · 2021 · Cited by 11 — The results show overall high word error rate (...

  2. Source: firefox.com
    Link: https://www.firefox.com/en-US/download/all/
    Source snippet

    Choose which Firefox Browser to download in your languageChoose which Firefox Browser to download in your language. Everyone deserves acc...

  3. Source: mozilladatacollective.com
    Link: https://mozilladatacollective.com/datasets
    Source snippet

    DatasetsTupuri (tui) scripted speech dataset: 1,800 clips (≈2h23min) from 16 speakers across two dialect subsets — Bango and Banwere with...

  4. Source: fairspeech.stanford.edu
    Link: https://fairspeech.stanford.edu/
    Source snippet

    The Race Gap in Speech Recognition TechnologyWe found that all five services showed significant racial disparities. Average err...

  5. Source: firefox.com
    Link: https://www.firefox.com/
    Source snippet

    Get Firefox for desktop and mobile — Firefox.comFirefox is a free web browser backed by Mozilla, a non-profit dedicated to internet healt...

  6. Source: facebook.com
    Link: https://www.facebook.com/HowardU/posts/a-lack-of-african-american-english-data-within-automated-speech-recognition-syst/1059645339526617/
    Source snippet

    Howard UniversityA lack of African American English data within automated speech recognition systems means Black users experience signifi...

  7. Source: nist.gov
    Title: study evaluates effects race age sex face recognition software
    Link: https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-face-recognition-software
    Source snippet

    NIST Study Evaluates Effects of Race, Age, Sex on Face...Dec 19, 2019 — A new NIST study examines how accurately face recognition softwa...

  8. Source: idtechwire.com
    Title: study finds racial bias in leading speech recognition systems 903246
    Link: https://idtechwire.com/study-finds-racial-bias-in-leading-speech-recognition-systems-903246/
    Source snippet

    Study Finds Racial Bias in Leading Speech Recognition...Mar 24, 2020 — The researchers speculate that the problem arises from the use of...

  9. Source: nhsjs.com
    Link: https://nhsjs.com/2025/evaluating-the-accessibility-of-automatic-speech-recognition-technology-across-accents/
    Source snippet

    y in ASR systems' ability to accurately transcribe content from diverse accents...Read more...

  10. Source: kerson.ai
    Link: https://kerson.ai/research/accent-bias-in-speech-recognition-challenges-impacts-and-solutions/
    Source snippet

    osoft, Apple) found nearly double the error rate for African American speakers...Read more...

Topic Tree

Follow this branch

Parent topic

Training data Why the data teaches the model

Related pages 2