Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Dysarthric speech—characterized by high variability and low intelligibility—severely degrades automatic speech recognition (ASR) performance. To address this, this paper introduces diffusion models to dysarthric speech enhancement for the first time, proposing two diffusion-based enhancement approaches designed to bridge the distributional gap between pathological and neurotypical speech. Evaluated on two English dysarthric speech corpora, the methods demonstrate significant improvements in speech quality, intelligibility, and naturalness, as measured by objective metrics (PESQ, STOI), subjective listening tests, and ASR evaluation using Whisper-Turbo. Furthermore, fine-tuning Whisper-Turbo on the enhanced data yields substantial gains in recognition accuracy. This work establishes a novel application of diffusion models in pathological speech processing and proposes a paradigm shift: “enhancement-driven distribution alignment” to improve ASR robustness for disordered speech.

Technology Category

Application Category

📝 Abstract

Dysarthric speech poses significant challenges for automatic speech recognition (ASR) systems due to its high variability and reduced intelligibility. In this work we explore the use of diffusion models for dysarthric speech enhancement, which is based on the hypothesis that using diffusion-based speech enhancement moves the distribution of dysarthric speech closer to that of typical speech, which could potentially improve dysarthric speech recognition performance. We assess the effect of two diffusion-based and one signal-processing-based speech enhancement algorithms on intelligibility and speech quality of two English dysarthric speech corpora. We applied speech enhancement to both typical and dysarthric speech and evaluate the ASR performance using Whisper-Turbo, and the subjective and objective speech quality of the original and enhanced dysarthric speech. We also fine-tuned Whisper-Turbo on the enhanced speech to assess its impact on recognition performance.

Problem

Research questions and friction points this paper is trying to address.

Evaluating diffusion models for dysarthric speech enhancement

Assessing impact on speech intelligibility and quality

Improving automatic recognition of dysarthric speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used diffusion models for speech enhancement

Applied Whisper-Turbo for ASR evaluation

Fine-tuned Whisper-Turbo on enhanced speech

🔎 Similar Papers

No similar papers found.