Language Bias in Self-Supervised Learning For Automatic Speech Recognition

📅 2024-12-02
🏛️ Spoken Language Technology Workshop
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multilingual self-supervised speech recognition (SSL-ASR) models (e.g., XLS-R) exhibit implicit language bias, where fine-tuning performance depends more on pretraining data volume from high-resource languages than on linguistic priors. Method: We introduce the Lottery Ticket Hypothesis (LTH) to SSL-ASR for the first time, proposing a language-specific subnetwork identification framework that integrates multilingual zero-shot transfer evaluation with fine-grained weight attribution analysis. Contribution/Results: Our empirical analysis reveals that during fine-tuning, XLS-R predominantly reuses weights learned dominantly from high-resource languages, leading to significant degradation of low-resource language subnetworks. This work provides the first systematic demonstration of a data-scale-driven language bias mechanism in SSL-ASR, uncovering previously overlooked harms of data imbalance. It offers theoretical insights and interpretable diagnostic tools for fair, robust multilingual speech modeling.

Technology Category

Application Category

📝 Abstract
Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. Recently, large Automatic Speech Recognition (ASR) models such as XLS-R have utilised SSL to train on over one hundred different languages simultaneously. However, deeper investigation shows that the bulk of the training data for XLS-R comes from a small number of languages. Biases learned through SSL have been shown to exist in multiple domains, but language bias in multilingual SSL ASR has not been thoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis (LTH) to identify language-specific subnetworks within XLS-R and test the performance of these subnetworks on a variety of different languages. We are able to show that when fine-tuning, XLS-R bypasses traditional linguistic knowledge and builds only on weights learned from the languages with the largest data contribution to the pretraining data.
Problem

Research questions and friction points this paper is trying to address.

Multilingual Speech Recognition
Language Bias
Performance Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lottery Ticket Hypothesis
Multilingual Speech Recognition
Bias Analysis
🔎 Similar Papers
No similar papers found.
E
Edward Storey
Sigmedia Lab, School of Engineering, Trinity College Dublin, Ireland; Centre for Speech Technology Research, School of Informatics, The University of Edinburgh, UK
Naomi Harte
Naomi Harte
Professor in Speech Technology, Trinity College Dublin
Audio-visual speech recognitionspeech qualitymultimodal interactionbirdsong analysis
P
Peter Bell
Centre for Speech Technology Research, School of Informatics, The University of Edinburgh, UK