Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Stuttering detection is hindered by the scarcity of high-quality annotated speech data. To address this, we propose LLM-Dys—the first large language model (LLM)-enhanced, large-scale dysfluency corpus, covering 11 types of word- and phoneme-level stuttering phenomena. Our method innovatively employs an LLM-driven, multi-level stuttering modeling framework that jointly leverages text-to-speech (TTS) synthesis and multi-granularity annotation, significantly improving prosodic naturalness and contextual diversity of synthetic speech. Leveraging this corpus, we develop an ASR-agnostic end-to-end stuttering detection framework, achieving state-of-the-art performance across multiple benchmarks. All data, models, and code are publicly released, establishing a reproducible and scalable foundational resource for clinical speech analysis.

Technology Category

Application Category

📝 Abstract

Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the most comprehensive dysfluent speech corpus with LLM-enhanced dysfluency simulation. This dataset captures 11 dysfluency categories spanning both word and phoneme levels. Building upon this resource, we improve an end-to-end dysfluency detection framework. Experimental validation demonstrates state-of-the-art performance. All data, models, and code are open-sourced at https://github.com/Berkeley-Speech-Group/LLM-Dys.

Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of high-quality annotated dysfluency data

Improving synthetic dysfluency datasets' prosody and diversity

Enhancing end-to-end dysfluency detection framework performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-enhanced dysfluency simulation for speech

Comprehensive 11-category dysfluency dataset

Open-sourced end-to-end detection framework

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection