Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

To address systematic accent bias induced by pseudo-labels in domain adaptation, this paper proposes a target-domain-label-free weight-space correction method. Specifically, the model is jointly trained on both ground-truth and pseudo-labeled data from the source domain; the resulting weight differential vector is then injected into the target-domain pseudo-label model via task arithmetic to mitigate cross-accent repetitive errors. This work is the first to apply task arithmetic for pseudo-label bias correction, eliminating reliance on target-domain ground-truth labels. Evaluated on Whisper-tiny across 10 African accents in AfriSpeech-200, the method achieves up to a 35% relative reduction in word error rate (WER), demonstrating substantial improvements in robustness and generalization—particularly in low-resource settings.

Technology Category

Application Category

📝 Abstract

Robust ASR under domain shift is crucial because real-world systems encounter unseen accents and domains with limited labeled data. Although pseudo-labeling offers a practical workaround, it often introduces systematic, accent-specific errors that filtering fails to fix. We ask: How can we correct these recurring biases without target ground truth? We propose a simple parameter-space correction: in a source domain containing both real and pseudo-labeled data, two ASR models are fine-tuned from the same initialization, one on ground-truth labels and the other on pseudo-labels, and their weight difference forms a correction vector that captures pseudo-label biases. When applied to a pseudo-labeled target model, this vector enhances recognition, achieving up to a 35% relative Word Error Rate (WER) reduction on AfriSpeech-200 across ten African accents with the Whisper tiny model.

Problem

Research questions and friction points this paper is trying to address.

Correct systematic accent-specific errors in pseudo-labels

Improve ASR robustness under domain shift without ground truth

Reduce word error rates for unseen accents using parameter-space correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task arithmetic corrects pseudo-label biases in ASR

Weight difference forms correction vector for model adjustment

Correction vector reduces word error rate across accents

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Authors to Follow