Confidence-Guided Error Correction for Disordered Speech Recognition

📅 2025-09-29
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Automatic speech recognition (ASR) systems exhibit poor robustness and a tendency toward over-correction when post-processing outputs for speakers with speech impairments. Method: We propose a confidence-guided, fine-grained prompting method that explicitly incorporates word-level ASR confidence scores into the large language model (LLM) post-processing pipeline—specifically, LLaMA-3.1. An uncertainty-aware prompting mechanism directs the LLM’s attention to low-confidence segments, while a subsequent filtering step suppresses excessive corrections. Contribution/Results: This work is the first to integrate ASR confidence as a structured prompt signal across both LLM training and inference stages, significantly enhancing cross-speaker and cross-dataset generalization. Evaluated on the Speech Accessibility Project and TORGO datasets, our method reduces word error rate (WER) by 10% and 47% relatively, outperforming baseline LLM-based correction and post-hoc filtering approaches.

Technology Category

Application Category

📝 Abstract
We investigate the use of large language models (LLMs) as post-processing modules for automatic speech recognition (ASR), focusing on their ability to perform error correction for disordered speech. In particular, we propose confidence-informed prompting, where word-level uncertainty estimates are embedded directly into LLM training to improve robustness and generalization across speakers and datasets. This approach directs the model to uncertain ASR regions and reduces overcorrection. We fine-tune a LLaMA 3.1 model and compare our approach to both transcript-only fine-tuning and post hoc confidence-based filtering. Evaluations show that our method achieves a 10% relative WER reduction compared to naive LLM correction on the Speech Accessibility Project spontaneous speech and a 47% reduction on TORGO, demonstrating the effectiveness of confidence-aware fine-tuning for impaired speech.
Problem

Research questions and friction points this paper is trying to address.

Corrects ASR errors in disordered speech using LLMs
Integrates word-level uncertainty into LLM training
Reduces overcorrection by targeting uncertain ASR regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs correct errors in disordered speech recognition
Embed word-level uncertainty into LLM training
Fine-tune LLaMA model with confidence-guided prompting
🔎 Similar Papers
No similar papers found.
A
Abner Hernandez
Pattern Recognition Lab, Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg, Germany
T
TomĂĄs Arias Vergara
Pattern Recognition Lab, Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg, Germany; GITA Lab. Facultad de IngenierĂ­a. Universidad de Antioquia UdeA, MedellĂ­n, Colombia
A
Andreas Maier
Pattern Recognition Lab, Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg, Germany
Paula Andrea Pérez-Toro
Paula Andrea Pérez-Toro
Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg; Universidad de Antioquia
Machine LearningSpeech AnalysisGait AnalysisNatural Language ProcessingDeep Learning