Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical progress in dysarthria speech analysis is hindered by severe scarcity of annotated data and stringent privacy constraints. Method: This study pioneers the application of voice cloning to faithfully reconstruct patient-specific acoustic characteristics. Leveraging the TORGO dataset, we implemented gender-matched cross-speaker cloning using a commercial platform and conducted double-blind evaluation by certified speech-language pathologists. Results: Synthesized speech preserved dysarthria identifiability with 100% accuracy, conveyed speaker gender correctly in 95% of cases, and was misclassified as natural speech in 30% of trials—demonstrating clinical-grade fidelity. We publicly release a high-quality synthetic dataset, substantially enhancing the generalizability and personalization capability of AI models for dysarthria diagnosis, rehabilitation, and human–machine interaction.

Technology Category

Application Category

📝 Abstract
This study explores voice cloning to generate synthetic speech replicating the unique patterns of individuals with dysarthria. Using the TORGO dataset, we address data scarcity and privacy challenges in speech-language pathology. Our contributions include demonstrating that voice cloning preserves dysarthric speech characteristics, analyzing differences between real and synthetic data, and discussing implications for diagnostics, rehabilitation, and communication. We cloned voices from dysarthric and control speakers using a commercial platform, ensuring gender-matched synthetic voices. A licensed speech-language pathologist (SLP) evaluated a subset for dysarthria, speaker gender, and synthetic indicators. The SLP correctly identified dysarthria in all cases and speaker gender in 95% but misclassified 30% of synthetic samples as real, indicating high realism. Our results suggest synthetic speech effectively captures disordered characteristics and that voice cloning has advanced to produce high-quality data resembling real speech, even to trained professionals. This has critical implications for healthcare, where synthetic data can mitigate data scarcity, protect privacy, and enhance AI-driven diagnostics. By enabling the creation of diverse, high-quality speech datasets, voice cloning can improve generalizable models, personalize therapy, and advance assistive technologies for dysarthria. We publicly release our synthetic dataset to foster further research and collaboration, aiming to develop robust models that improve patient outcomes in speech-language pathology.
Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in dysarthric speech synthesis.
Preserving dysarthric speech characteristics using voice cloning.
Enhancing AI-driven diagnostics and personalized therapy for dysarthria.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voice cloning replicates dysarthric speech patterns.
Synthetic data addresses speech pathology data scarcity.
High-quality synthetic speech resembles real speech.
🔎 Similar Papers
No similar papers found.
B
Birger Moell
Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
Fredrik Sand Aronsson
Fredrik Sand Aronsson
PhD student, Karolinska institutet
Machine learningspeech and language impairments in neurodegenerative disorders