From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low-resource challenge in classroom speech recognition—where abundant weakly labeled data coexists with scarce high-accuracy annotations—this paper proposes a Weakly Supervised Pre-training (WSP) paradigm. WSP first performs noise-robust pre-training on end-to-end ASR models (Conformer and Whisper) using 5,000 hours of inexpensive weak-label data, followed by fine-tuning on a small set of gold-standard transcriptions. To mitigate label noise, WSP integrates label smoothing with curriculum learning. Experiments demonstrate that WSP consistently outperforms state-of-the-art semi-supervised and self-supervised methods under both realistic and synthetic weak-label settings, achieving a 18–24% relative reduction in word error rate. The approach delivers an efficient, production-ready weakly supervised solution tailored for educational ASR applications.

Technology Category

Application Category

📝 Abstract
Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for low-resource ASR in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Utilizing noisy classroom transcripts with minimal accurate data
Improving ASR in low-resource settings with weak supervision
Optimizing model training when gold-standard data is scarce
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly Supervised Pretraining for ASR
Pretrain on weak transcripts first
Fine-tune with limited gold-standard data
Ahmed Adel Attia
Ahmed Adel Attia
University Of Maryland
Dorottya Demszky
Dorottya Demszky
Assistant Professor, Stanford University
natural language processingeducation data scienceteacher professional learning
J
Jing Liu
College of Education, University of Maryland
C
Carol Y. Espy-Wilson
Electrical and Computer Engineering, University of Maryland