A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

📅 2025-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates racial bias in four major commercial automatic speech recognition (ASR) systems. Using the Pacific Northwest English Corpus—featuring speakers from African American, White, Chicano, and Yakama communities—the authors quantify cross-ethnic transcription disparities via a novel phoneme error rate (PER) metric integrated with sociophonetic annotations. Results reveal significantly higher PER for African American speakers across all systems; critically, all models substantially underrepresent dialectal phenomena such as low-vowel mergers, confirming that inadequate acoustic modeling of sociophonetic variation constitutes the primary source of bias. The study introduces an analytical framework linking PER to fine-grained sociophonetic features, identifying vowel quality variation as a key determinant of performance disparity. These findings underscore the necessity of incorporating dialectal diversity into ASR training and evaluation to advance fairness and robustness in speech technology.

Technology Category

Application Category

📝 Abstract
This paper presents a systematic evaluation of racial bias in four major commercial automatic speech recognition (ASR) systems using the Pacific Northwest English (PNWE) corpus. We analyze transcription accuracy across speakers from four ethnic backgrounds (African American, Caucasian American, ChicanX, and Yakama) and examine how sociophonetic variation contributes to differential system performance. We introduce a heuristically-determined Phonetic Error Rate (PER) metric that links recognition errors to specific linguistically motivated variables derived from sociophonetic annotation. Our analysis of eleven sociophonetic features reveals that vowel quality variation, particularly resistance to the low-back merger and pre-nasal merger patterns, is systematically associated with differential error rates across ethnic groups, with the most pronounced effects for African American speakers across all evaluated systems. These findings demonstrate that acoustic modeling of dialectal phonetic variation, rather than lexical or syntactic factors, remains a primary source of bias in commercial ASR systems. The study establishes the PNWE corpus as a valuable resource for bias evaluation in speech technologies and provides actionable guidance for improving ASR performance through targeted representation of sociophonetic diversity in training data.
Problem

Research questions and friction points this paper is trying to address.

Evaluating racial bias in commercial ASR systems across ethnic groups
Analyzing how sociophonetic variation causes differential ASR performance
Identifying phonetic variation as primary source of ASR bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heuristically-determined Phonetic Error Rate metric
Analysis of sociophonetic features for bias detection
Using dialectal phonetic variation for ASR improvement
🔎 Similar Papers
No similar papers found.