🤖 AI Summary
This study investigates gender fairness in automatic speech recognition (ASR), moving beyond demographic parity to systematically analyze how training data gender composition and acoustic characteristics—particularly pitch variability—affect model bias. Using the LibriSpeech corpus and the Whisper-small model, we integrate acoustic feature analysis with multidimensional fairness quantification. Our findings reveal: (1) optimal gender fairness does not occur at a 50–50 training data split; instead, a specific, non-uniform gender ratio maximizes fairness; and (2) pitch variability is a critical acoustic driver of gender bias—an empirically validated insight not previously established. We characterize the nonlinear relationship between training data gender composition and ASR fairness, providing reproducible guidelines for data balancing and actionable strategies for bias mitigation. This work establishes a methodological foundation for developing equitable ASR systems.
📝 Abstract
This study investigates factors influencing Automatic Speech Recognition (ASR) systems' fairness and performance across genders, beyond the conventional examination of demographics. Using the LibriSpeech dataset and the Whisper small model, we analyze how performance varies across different gender representations in training data. Our findings suggest a complex interplay between the gender ratio in training data and ASR performance. Optimal fairness occurs at specific gender distributions rather than a simple 50-50 split. Furthermore, our findings suggest that factors like pitch variability can significantly affect ASR accuracy. This research contributes to a deeper understanding of biases in ASR systems, highlighting the importance of carefully curated training data in mitigating gender bias.