🤖 AI Summary
This study addresses the limitations of current automatic speech recognition (ASR) systems in transcribing pathological speech from individuals with Huntington’s disease (HD), which is characterized by irregular speech rate, vocal instability, and articulatory distortions. Leveraging high-fidelity clinical speech data, the authors systematically evaluate mainstream ASR architectures—including Parakeet-TDT, encoder-decoder models, and CTC-based systems—and introduce, for the first time, disease biomarkers as auxiliary supervision signals to specifically optimize HD speech recognition. Experimental results demonstrate a significant reduction in word error rate (WER) from 6.99% to 4.95%, revealing a non-uniform improvement pattern correlated with disease severity. The code and trained models have been made publicly available.
📝 Abstract
Automatic speech recognition (ASR) for pathological speech remains underexplored, especially for Huntington's disease (HD), where irregular timing, unstable phonation, and articulatory distortion challenge current models. We present a systematic HD-ASR study using a high-fidelity clinical speech corpus not previously used for end-to-end ASR training. We compare multiple ASR families under a unified evaluation, analyzing WER as well as substitution, deletion, and insertion patterns. HD speech induces architecture-specific error regimes, with Parakeet-TDT outperforming encoder-decoder and CTC baselines. HD-specific adaptation reduces WER from 6.99% to 4.95% and we also propose a method for using biomarker-based auxiliary supervision and analyze how error behavior is reshaped in severity-dependent ways rather than uniformly improving WER. We open-source all code and models.