Contrastive Regularization for Accent-Robust ASR

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the significant performance degradation of automatic speech recognition (ASR) systems when encountering unseen or non-native accents. To enhance robustness, the authors propose a lightweight, model-agnostic regularization method that requires no accent labels: during CTC fine-tuning, supervised contrastive learning (SupCon) is introduced to refine the geometric structure of encoder representations via utterance-level contrastive loss. The approach operates solely with a self-supervised pre-trained acoustic model and a standard CTC framework, without architectural modifications or explicit accent annotations. Evaluated on the L2-ARCTIC benchmark, the method achieves up to a 29% relative reduction in word error rate and demonstrates that the learned representations are more compact and stable under accent variation, substantially improving ASR generalization to unseen accents.
📝 Abstract
ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.
Problem

Research questions and friction points this paper is trying to address.

accent robustness
automatic speech recognition
accent variability
self-supervised pretraining
CTC fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

supervised contrastive learning
accent robustness
CTC fine-tuning
self-supervised pretraining
representation regularization
🔎 Similar Papers