Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor robustness of speech recognition for dysarthric speech, the reliance of existing methods on text transcriptions and phoneme alignments, and their limited generalizability, this paper proposes a fully unsupervised dual-path framework jointly modeling prosody and acoustics. Leveraging wav2vec 2.0 self-supervised representations, it achieves alignment-free prosodic normalization; an adversarial acoustic conversion module further maps dysarthric speech to neurotypical acoustic characteristics, enabling compatibility with standard ASR systems. Crucially, the method requires no phoneme alignments, textual transcriptions, or speaker-specific priors, and generalizes effectively to unseen individuals and severely impaired speakers. Evaluated on the TORGO corpus, it significantly enhances performance of large pretrained ASR models—reducing word error rate by 23.6% for severely dysarthric speakers—without any ASR fine-tuning.

Technology Category

Application Category

📝 Abstract
Automatic speech recognition (ASR) systems are well known to perform poorly on dysarthric speech. Previous works have addressed this by speaking rate modification to reduce the mismatch with typical speech. Unfortunately, these approaches rely on transcribed speech data to estimate speaking rates and phoneme durations, which might not be available for unseen speakers. Therefore, we combine unsupervised rhythm and voice conversion methods based on self-supervised speech representations to map dysarthric to typical speech. We evaluate the outputs with a large ASR model pre-trained on healthy speech without further fine-tuning and find that the proposed rhythm conversion especially improves performance for speakers of the Torgo corpus with more severe cases of dysarthria. Code and audio samples are available at https://idiap.github.io/RnV .
Problem

Research questions and friction points this paper is trying to address.

Speech Recognition
Speech Impairments
Adaptation Methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Speech Clarity Enhancement
Unsupervised Learning
Speech Impaired Recognition
🔎 Similar Papers
No similar papers found.
Karl El Hajal
Karl El Hajal
EPFL
Speech ProcessingNatural Language ProcessingMachine Learning
Enno Hermann
Enno Hermann
Postdoc, IDIAP Research Institute, Switzerland
Speech RecognitionSpeech SynthesisNatural Language ProcessingMachine Learning
A
Ajinkya Kulkarni
Idiap Research Institute, CH-1920 Martigny, Switzerland
M
Mathew Magimai.-Doss
Idiap Research Institute, CH-1920 Martigny, Switzerland