Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mainstream English automatic speech recognition (ASR) models—such as Whisper and Seamless-M4T—exhibit substantial disparities in word error rate (WER) across second-language speakers with diverse accents, reflecting poor fairness. Method: This paper proposes a fairness-aware prompt-tuning framework that jointly integrates spectral decoupling (SD), group distributionally robust optimization (Group-DRO), and invariant risk minimization (IRM) into lightweight adapter-based prompt tuning, thereby jointly optimizing cross-accent performance parity beyond standard empirical risk minimization. Contribution/Results: Experiments show that our method reduces macro-average WER by 58.7% and 58.5% relative to pretrained Whisper and Seamless-M4T, respectively—significantly outperforming conventional fine-tuning. It substantially narrows inter-accent fairness gaps and establishes a novel paradigm for building fair, robust multiaccent ASR systems.

Technology Category

Application Category

📝 Abstract
In this work, we address the challenge of building fair English ASR systems for second-language speakers. Our analysis of widely used ASR models, Whisper and Seamless-M4T, reveals large fluctuations in word error rate (WER) across 26 accent groups, indicating significant fairness gaps. To mitigate this, we propose fairness-prompted finetuning with lightweight adapters, incorporating Spectral Decoupling (SD), Group Distributionally Robust Optimization (Group-DRO), and Invariant Risk Minimization (IRM). Our proposed fusion of traditional empirical risk minimization (ERM) with cross-entropy and fairness-driven objectives (SD, Group DRO, and IRM) enhances fairness across accent groups while maintaining overall recognition accuracy. In terms of macro-averaged word error rate, our approach achieves a relative improvement of 58.7% and 58.5% over the large pretrained Whisper and SeamlessM4T, and 9.7% and 7.8% over them, finetuning with standard empirical risk minimization with cross-entropy loss.
Problem

Research questions and friction points this paper is trying to address.

Addressing fairness gaps in English ASR for second-language speakers
Reducing word error rate fluctuations across diverse accent groups
Enhancing ASR fairness while maintaining overall recognition accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fairness-prompted finetuning with lightweight adapters
Fusion of ERM with fairness-driven objectives
Enhances fairness across accent groups while maintaining accuracy
🔎 Similar Papers
No similar papers found.