Towards Temporally Explainable Dysarthric Speech Clarity Assessment

๐Ÿ“… 2025-05-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the lack of automated, interpretable feedback in speech assessment for individuals with dysarthria. We propose a three-stage interpretable speech evaluation framework comprising intelligibility scoring, temporal mispronunciation localization, and error-type classification. Innovatively adapting pre-trained automatic speech recognition (ASR) modelsโ€”Whisper and Wav2Vec 2.0โ€”to pathological speech, we introduce the first temporally interpretable design for dysarthric speech assessment. Our approach integrates dynamic time alignment, sliding-window detection, and multi-task learning to generate fine-grained, therapy-oriented pronunciation feedback. Evaluated on speech data from six patients, the framework achieves a mispronunciation localization error of <120 ms and an error-type classification F1-score of 78.3%. The source code and an interactive demonstration are publicly released to support clinical decision-making.

Technology Category

Application Category

๐Ÿ“ Abstract
Dysarthria, a motor speech disorder, affects intelligibility and requires targeted interventions for effective communication. In this work, we investigate automated mispronunciation feedback by collecting a dysarthric speech dataset from six speakers reading two passages, annotated by a speech therapist with temporal markers and mispronunciation descriptions. We design a three-stage framework for explainable mispronunciation evaluation: (1) overall clarity scoring, (2) mispronunciation localization, and (3) mispronunciation type classification. We systematically analyze pretrained Automatic Speech Recognition (ASR) models in each stage, assessing their effectiveness in dysarthric speech evaluation (Code available at: https://github.com/augmented-human-lab/interspeech25_speechtherapy, Supplementary webpage: https://apps.ahlab.org/interspeech25_speechtherapy/). Our findings offer clinically relevant insights for automating actionable feedback for pronunciation assessment, which could enable independent practice for patients and help therapists deliver more effective interventions.
Problem

Research questions and friction points this paper is trying to address.

Automated mispronunciation feedback for dysarthric speech
Temporal localization and classification of mispronunciations
Assessing ASR models for dysarthric speech clarity evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage framework for explainable mispronunciation evaluation
Utilizes pretrained ASR models for dysarthric speech
Automates actionable feedback for pronunciation assessment
๐Ÿ”Ž Similar Papers
No similar papers found.
Seohyun Park
Seohyun Park
Korea University
Chitralekha Gupta
Chitralekha Gupta
Senior Research Fellow at National University of Singapore
Music Information RetrievalAudio Signal ProcessingMachine LearningDeep Learning
M
Michelle Kah Yian Kwan
Department of Healthcare Redesign, Alexandra Hospital, Singapore
X
Xinhui Fung
Department of Healthcare Redesign, Alexandra Hospital, Singapore
A
Alexander Wenjun Yip
Department of Healthcare Redesign, Alexandra Hospital, Singapore
S
Suranga Nanayakkara
Augmented Human Lab, School of Computing, National University of Singapore, Singapore