Exploring Generative Error Correction for Dysarthric Speech Recognition

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low speech recognition accuracy for dysarthric speech, this paper proposes a two-stage Generative Error Rectification (GER) framework. First, an end-to-end ASR model (e.g., Whisper or Conformer) generates an initial transcription; second, a fine-tuned large language model (LLM) performs generative correction guided by semantic coherence and phonetic consistency. The key contribution is the first systematic integration of generative LLM-based correction into dysarthric ASR, complemented by a novel hypothesis selection strategy to enhance robustness against phonetic variability—thereby revealing the complementary roles of acoustic and linguistic modeling. Evaluated on the Speech Accessibility Project dataset, GER significantly improves word-level accuracy for both structured and spontaneous utterances, demonstrating strong correction capability for substitution, insertion, and deletion errors. However, character-level recognition remains challenging.

Technology Category

Application Category

📝 Abstract
Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based generative error correction (GER). We assess different configurations of model scales and training strategies, incorporating specific hypothesis selection to improve transcription accuracy. Experiments on the Speech Accessibility Project dataset demonstrate the strength of our approach on structured and spontaneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide insights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition
Problem

Research questions and friction points this paper is trying to address.

Improving dysarthric speech recognition accuracy
Combining ASR with LLM-based error correction
Addressing challenges in single-word recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework combining ASR and LLM-based GER
Model scale and training strategy optimization
Specific hypothesis selection for accuracy improvement
🔎 Similar Papers
No similar papers found.