ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of pseudo-labeling in semi-supervised speech recognition, which is prone to confirmation bias and error accumulation that degrade model performance. To mitigate these issues, the authors propose ReHear, a novel framework that, for the first time, integrates an instruction-tuned audio-aware large language model into the self-training loop. By jointly modeling automatic speech recognition (ASR) hypotheses and raw audio inputs, ReHear enables iterative refinement and correction of phoneme-level pseudo-labels. This approach effectively suppresses error propagation and consistently outperforms both conventional pseudo-labeling baselines and fully supervised methods across multiple benchmark datasets, yielding significant improvements in end-to-end speech recognition accuracy.

Technology Category

Application Category

📝 Abstract
Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative pseudo-label refinement that integrates an instruction-tuned, audio-aware large language model (LLM) into the self-training loop. Unlike conventional text-based correctors, our approach conditions the LLM on both the ASR hypothesis and the source audio, allowing it to recover phonetically accurate transcripts even from severe recognition errors. These refined pseudo-labels serve as high-fidelity targets for fine-tuning the ASR model in an iterative cycle. Experimental results across diverse benchmarks demonstrate that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling baselines.
Problem

Research questions and friction points this paper is trying to address.

semi-supervised learning
automatic speech recognition
pseudo-labeling
confirmation bias
error accumulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

pseudo-label refinement
audio large language model
semi-supervised speech recognition
iterative self-training
error propagation mitigation
🔎 Similar Papers
No similar papers found.
Z
Zefang Liu
Capital One, USA
C
Chenyang Zhu
Capital One, USA
Sangwoo Cho
Sangwoo Cho
Capital One
Natural Language ProcessingComputer VisionDeep LearningMachine Learning
S
Shi-Xiong Zhang
Capital One, USA