Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of traditional automatic speech recognition (ASR) systems that rely solely on word error rate (WER), which often fails to preserve semantic accuracy—particularly in dysarthric speech, where it can lead to severe semantic distortions. To overcome this, the authors propose a large language model (LLM)-based Judge-Editor agent that performs semantics-aware local rewriting over the ASR system’s top-k hypotheses: retaining high-confidence segments while revising uncertain portions, operating in both zero-shot and fine-tuned modes. This is the first application of an LLM agent for semantic-level post-correction of dysarthric speech. The study also introduces SAP-Hypo5, the largest benchmark to date for evaluating beyond-WER, multidimensional performance. Experiments demonstrate a 14.51% WER reduction, a 7.59-point gain in MENLI, and a 7.66-point improvement in Slot Micro F1 on challenging samples, confirming the strong correlation between semantic fidelity and downstream task performance.

Technology Category

Application Category

📝 Abstract
While Automatic Speech Recognition (ASR) is typically benchmarked by word error rate (WER), real-world applications ultimately hinge on semantic fidelity. This mismatch is particularly problematic for dysarthric speech, where articulatory imprecision and disfluencies can cause severe semantic distortions. To bridge this gap, we introduce a Large Language Model (LLM)-based agent for post-ASR correction: a Judge-Editor over the top-k ASR hypotheses that keeps high-confidence spans, rewrites uncertain segments, and operates in both zero-shot and fine-tuned modes. In parallel, we release SAP-Hypo5, the largest benchmark for dysarthric speech correction, to enable reproducibility and future exploration. Under multi-perspective evaluation, our agent achieves a 14.51% WER reduction alongside substantial semantic gains, including a +7.59 pp improvement in MENLI and +7.66 pp in Slot Micro F1 on challenging samples. Our analysis further reveals that WER is highly sensitive to domain shift, whereas semantic metrics correlate more closely with downstream task performance.
Problem

Research questions and friction points this paper is trying to address.

dysarthric speech
semantic fidelity
word error rate
ASR evaluation
speech recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-Agent
post-ASR correction
dysarthric speech
semantic fidelity
SAP-Hypo5
🔎 Similar Papers
No similar papers found.
X
Xiuwen Zheng
Dept. of ECE, University of Illinois Urbana-Champaign, IL, USA
Sixun Dong
Sixun Dong
Arizona State University
Computer VisonMultimodal LearningVisual Language Model
B
Bornali Phukon
Dept. of ECE, University of Illinois Urbana-Champaign, IL, USA
M
M. Hasegawa-Johnson
Dept. of ECE, University of Illinois Urbana-Champaign, IL, USA
C
C. Yoo
Dept. of EE, Korea Advanced Institute of Science & Technology, KR