Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

πŸ“… 2025-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Stuttering detection is hindered by the scarcity of high-quality annotated speech data. To address this, we propose LLM-Dysβ€”the first large language model (LLM)-enhanced, large-scale dysfluency corpus, covering 11 types of word- and phoneme-level stuttering phenomena. Our method innovatively employs an LLM-driven, multi-level stuttering modeling framework that jointly leverages text-to-speech (TTS) synthesis and multi-granularity annotation, significantly improving prosodic naturalness and contextual diversity of synthetic speech. Leveraging this corpus, we develop an ASR-agnostic end-to-end stuttering detection framework, achieving state-of-the-art performance across multiple benchmarks. All data, models, and code are publicly released, establishing a reproducible and scalable foundational resource for clinical speech analysis.

Technology Category

Application Category

πŸ“ Abstract
Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the most comprehensive dysfluent speech corpus with LLM-enhanced dysfluency simulation. This dataset captures 11 dysfluency categories spanning both word and phoneme levels. Building upon this resource, we improve an end-to-end dysfluency detection framework. Experimental validation demonstrates state-of-the-art performance. All data, models, and code are open-sourced at https://github.com/Berkeley-Speech-Group/LLM-Dys.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of high-quality annotated dysfluency data
Improving synthetic dysfluency datasets' prosody and diversity
Enhancing end-to-end dysfluency detection framework performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-enhanced dysfluency simulation for speech
Comprehensive 11-category dysfluency dataset
Open-sourced end-to-end detection framework
πŸ”Ž Similar Papers
Jinming Zhang
Jinming Zhang
Queen Mary University of London
LLMsLLMs in Game
Xuanru Zhou
Xuanru Zhou
Zhejiang University
Speech ProcessingMultimodalRepresentation Learning
Jiachen Lian
Jiachen Lian
UC Berkeley
precision healthcarespeech processingmachine learning
S
Shuhe Li
Zhejiang University, China
W
William Li
UC Berkeley, USA
Z
Z. Ezzes
UCSF, USA
R
Rian Bogley
UCSF, USA
L
L. Wauters
UCSF, USA
Zachary Miller
Zachary Miller
Associate Professor of Neurology, UCSF Memory and Aging Center
Behavioral NeurologyDementiaNeurodevelopmentImmunology
J
Jet Vonk
UCSF, USA
B
Brittany Morin
UCSF, USA
M
Maria-Louisa Gorno-Tempini
UCSF, USA
G
G. Anumanchipalli
UC Berkeley, USA