Towards Temporally Explainable Dysarthric Speech Clarity Assessment

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the lack of automated, interpretable feedback in speech assessment for individuals with dysarthria. We propose a three-stage interpretable speech evaluation framework comprising intelligibility scoring, temporal mispronunciation localization, and error-type classification. Innovatively adapting pre-trained automatic speech recognition (ASR) models—Whisper and Wav2Vec 2.0—to pathological speech, we introduce the first temporally interpretable design for dysarthric speech assessment. Our approach integrates dynamic time alignment, sliding-window detection, and multi-task learning to generate fine-grained, therapy-oriented pronunciation feedback. Evaluated on speech data from six patients, the framework achieves a mispronunciation localization error of <120 ms and an error-type classification F1-score of 78.3%. The source code and an interactive demonstration are publicly released to support clinical decision-making.

Technology Category

Application Category

📝 Abstract

Dysarthria, a motor speech disorder, affects intelligibility and requires targeted interventions for effective communication. In this work, we investigate automated mispronunciation feedback by collecting a dysarthric speech dataset from six speakers reading two passages, annotated by a speech therapist with temporal markers and mispronunciation descriptions. We design a three-stage framework for explainable mispronunciation evaluation: (1) overall clarity scoring, (2) mispronunciation localization, and (3) mispronunciation type classification. We systematically analyze pretrained Automatic Speech Recognition (ASR) models in each stage, assessing their effectiveness in dysarthric speech evaluation (Code available at: https://github.com/augmented-human-lab/interspeech25_speechtherapy, Supplementary webpage: https://apps.ahlab.org/interspeech25_speechtherapy/). Our findings offer clinically relevant insights for automating actionable feedback for pronunciation assessment, which could enable independent practice for patients and help therapists deliver more effective interventions.

Problem

Research questions and friction points this paper is trying to address.

Automated mispronunciation feedback for dysarthric speech

Temporal localization and classification of mispronunciations

Assessing ASR models for dysarthric speech clarity evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage framework for explainable mispronunciation evaluation

Utilizes pretrained ASR models for dysarthric speech

Automates actionable feedback for pronunciation assessment

🔎 Similar Papers

No similar papers found.