SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing public paralinguistic datasets—e.g., laughter, sighs—suffer from fragmented utterances, missing or inaccurate timestamps, and low ecological validity, often relying on proprietary resources, thereby hindering progress in natural speech synthesis and paralinguistic understanding. To address these limitations, we propose the first fully automated framework for constructing large-scale paralinguistic datasets from authentic, spontaneous dialogues. Our pipeline integrates robust speech segmentation, fine-grained paralinguistic event detection (covering six categories), and precise temporal alignment. Leveraging this framework, we release SynParaSpeech—an open-source dataset comprising 118.75 hours of high-fidelity audio with millisecond-accurate timestamps. Empirical evaluation demonstrates substantial improvements: state-of-the-art text-to-speech models trained on SynParaSpeech achieve significantly enhanced prosodic naturalness, while paralinguistic event detection models attain markedly higher precision and recall. This work establishes a scalable, reproducible paradigm for paralinguistic data curation and advances both generative and analytical research in spoken language processing.

Technology Category

Application Category

📝 Abstract
Paralinguistic sounds, like laughter and sighs, are crucial for synthesizing more realistic and engaging speech. However, existing methods typically depend on proprietary datasets, while publicly available resources often suffer from incomplete speech, inaccurate or missing timestamps, and limited real-world relevance. To address these problems, we propose an automated framework for generating large-scale paralinguistic data and apply it to construct the SynParaSpeech dataset. The dataset comprises 6 paralinguistic categories with 118.75 hours of data and precise timestamps, all derived from natural conversational speech. Our contributions lie in introducing the first automated method for constructing large-scale paralinguistic datasets and releasing the SynParaSpeech corpus, which advances speech generation through more natural paralinguistic synthesis and enhances speech understanding by improving paralinguistic event detection. The dataset and audio samples are available at https://github.com/ShawnPi233/SynParaSpeech.
Problem

Research questions and friction points this paper is trying to address.

Automated synthesis of paralinguistic datasets for speech
Addressing incomplete speech and inaccurate timestamps in resources
Improving paralinguistic event detection and natural speech synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for generating paralinguistic datasets
Constructed large-scale dataset with precise timestamps
Derived from natural conversational speech categories
🔎 Similar Papers
No similar papers found.
Bingsong Bai
Bingsong Bai
北京邮电大学
Text-to-speechVoice ConversionSpeech Processing
Q
Qihang Lu
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
W
Wenbing Yang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
Z
Zihan Sun
Hello Group Inc., China
Y
YueRan Hou
Hello Group Inc., China
P
Peilei Jia
Hello Group Inc., China
S
Songbai Pu
Hello Group Inc., China
Ruibo Fu
Ruibo Fu
Associate Professor,CASIA
AIGCLMMIntelligent speech interactionDeepfake detection
Yingming Gao
Yingming Gao
Beijing University of Posts and Telecommunications
Computer Assisted Language LearningAcoustic Phonetics and Speech Synthesis
Y
Ya Li
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
J
Jun Gao
Hello Group Inc., China