MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Automatic speech recognition (ASR) models exhibit poor cross-domain robustness and suffer from severe data scarcity for low-resource languages (e.g., Greek) under weak supervision. Method: We propose MSDA, a two-stage sample-efficient domain adaptation framework. Stage I performs coarse-grained domain alignment leveraging wav2vec 2.0 self-supervised representations; Stage II refines adaptation via consistency-regularized pseudo-labeling, curriculum-based sample selection, and progressive fine-tuning. Contribution/Results: MSDA is the first to systematically demonstrate that self-supervised pretraining and self-training must be decoupled into staged, synergistic design—rather than fused end-to-end. Evaluated on multiple cross-domain ASR benchmarks, MSDA achieves state-of-the-art performance, reducing average word error rate by 18.7% over the best baseline. It further exhibits significantly improved stability under noisy conditions and extremely low annotation budgets.

Technology Category

Application Category

📝 Abstract

In this work, we investigate the Meta PL unsupervised domain adaptation framework for Automatic Speech Recognition (ASR). We introduce a Multi-Stage Domain Adaptation pipeline (MSDA), a sample-efficient, two-stage adaptation approach that integrates self-supervised learning with semi-supervised techniques. MSDA is designed to enhance the robustness and generalization of ASR models, making them more adaptable to diverse conditions. It is particularly effective for low-resource languages like Greek and in weakly supervised scenarios where labeled data is scarce or noisy. Through extensive experiments, we demonstrate that Meta PL can be applied effectively to ASR tasks, achieving state-of-the-art results, significantly outperforming state-of-the-art methods, and providing more robust solutions for unsupervised domain adaptation in ASR. Our ablations highlight the necessity of utilizing a cascading approach when combining self-supervision with self-training.

Problem

Research questions and friction points this paper is trying to address.

Enhancing ASR robustness via unsupervised domain adaptation

Adapting ASR models to low-resource languages like Greek

Combining self-supervision with semi-supervised learning for ASR

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining pseudo-labeling and self-supervision for ASR

Two-stage adaptation with self-supervised learning

Cascading approach for robust domain adaptation

🔎 Similar Papers

No similar papers found.