๐ค AI Summary
This study addresses the lack of automated, interpretable feedback in speech assessment for individuals with dysarthria. We propose a three-stage interpretable speech evaluation framework comprising intelligibility scoring, temporal mispronunciation localization, and error-type classification. Innovatively adapting pre-trained automatic speech recognition (ASR) modelsโWhisper and Wav2Vec 2.0โto pathological speech, we introduce the first temporally interpretable design for dysarthric speech assessment. Our approach integrates dynamic time alignment, sliding-window detection, and multi-task learning to generate fine-grained, therapy-oriented pronunciation feedback. Evaluated on speech data from six patients, the framework achieves a mispronunciation localization error of <120 ms and an error-type classification F1-score of 78.3%. The source code and an interactive demonstration are publicly released to support clinical decision-making.
๐ Abstract
Dysarthria, a motor speech disorder, affects intelligibility and requires targeted interventions for effective communication. In this work, we investigate automated mispronunciation feedback by collecting a dysarthric speech dataset from six speakers reading two passages, annotated by a speech therapist with temporal markers and mispronunciation descriptions. We design a three-stage framework for explainable mispronunciation evaluation: (1) overall clarity scoring, (2) mispronunciation localization, and (3) mispronunciation type classification. We systematically analyze pretrained Automatic Speech Recognition (ASR) models in each stage, assessing their effectiveness in dysarthric speech evaluation (Code available at: https://github.com/augmented-human-lab/interspeech25_speechtherapy, Supplementary webpage: https://apps.ahlab.org/interspeech25_speechtherapy/). Our findings offer clinically relevant insights for automating actionable feedback for pronunciation assessment, which could enable independent practice for patients and help therapists deliver more effective interventions.