From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the low-resource challenge in classroom speech recognition—where abundant weakly labeled data coexists with scarce high-accuracy annotations—this paper proposes a Weakly Supervised Pre-training (WSP) paradigm. WSP first performs noise-robust pre-training on end-to-end ASR models (Conformer and Whisper) using 5,000 hours of inexpensive weak-label data, followed by fine-tuning on a small set of gold-standard transcriptions. To mitigate label noise, WSP integrates label smoothing with curriculum learning. Experiments demonstrate that WSP consistently outperforms state-of-the-art semi-supervised and self-supervised methods under both realistic and synthetic weak-label settings, achieving a 18–24% relative reduction in word error rate. The approach delivers an efficient, production-ready weakly supervised solution tailored for educational ASR applications.

Technology Category

Application Category

📝 Abstract

Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for low-resource ASR in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Utilizing noisy classroom transcripts with minimal accurate data

Improving ASR in low-resource settings with weak supervision

Optimizing model training when gold-standard data is scarce

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly Supervised Pretraining for ASR

Pretrain on weak transcripts first

Fine-tune with limited gold-standard data

🔎 Similar Papers

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models