Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses inherent fairness biases in the F5-TTS model for dysarthric speech synthesis, systematically analyzing its imbalanced performance across three dimensions: intelligibility, speaker similarity, and prosody preservation. Using the TORGO dataset, we introduce fairness metrics—including Disparate Impact and Parity Difference—for the first time in pathological speech synthesis to quantify cloning performance across varying dysarthria severity levels. Results reveal that F5-TTS exhibits strong preference for mild-dysarthria samples: while intelligibility improves, speaker identity consistency and prosodic naturalness degrade significantly. This work pioneers the application of algorithmic fairness analysis to zero-shot pathological voice cloning, uncovering structural bias in current models. We further propose a fairness-aware optimization framework for dysarthric speech synthesis, offering empirical evidence and methodological guidance toward inclusive, equitable speech technologies.

Technology Category

Application Category

📝 Abstract

Dysarthric speech poses significant challenges in developing assistive technologies, primarily due to the limited availability of data. Recent advances in neural speech synthesis, especially zero-shot voice cloning, facilitate synthetic speech generation for data augmentation; however, they may introduce biases towards dysarthric speech. In this paper, we investigate the effectiveness of state-of-the-art F5-TTS in cloning dysarthric speech using TORGO dataset, focusing on intelligibility, speaker similarity, and prosody preservation. We also analyze potential biases using fairness metrics like Disparate Impact and Parity Difference to assess disparities across dysarthric severity levels. Results show that F5-TTS exhibits a strong bias toward speech intelligibility over speaker and prosody preservation in dysarthric speech synthesis. Insights from this study can help integrate fairness-aware dysarthric speech synthesis, fostering the advancement of more inclusive speech technologies.

Problem

Research questions and friction points this paper is trying to address.

Investigating bias in dysarthric speech cloning using F5-TTS

Assessing fairness metrics across dysarthric severity levels

Evaluating intelligibility vs. speaker and prosody preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

F5-TTS for dysarthric speech cloning

Fairness metrics to assess bias

Focus on intelligibility over speaker prosody

🔎 Similar Papers

No similar papers found.