SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing public paralinguistic datasets—e.g., laughter, sighs—suffer from fragmented utterances, missing or inaccurate timestamps, and low ecological validity, often relying on proprietary resources, thereby hindering progress in natural speech synthesis and paralinguistic understanding. To address these limitations, we propose the first fully automated framework for constructing large-scale paralinguistic datasets from authentic, spontaneous dialogues. Our pipeline integrates robust speech segmentation, fine-grained paralinguistic event detection (covering six categories), and precise temporal alignment. Leveraging this framework, we release SynParaSpeech—an open-source dataset comprising 118.75 hours of high-fidelity audio with millisecond-accurate timestamps. Empirical evaluation demonstrates substantial improvements: state-of-the-art text-to-speech models trained on SynParaSpeech achieve significantly enhanced prosodic naturalness, while paralinguistic event detection models attain markedly higher precision and recall. This work establishes a scalable, reproducible paradigm for paralinguistic data curation and advances both generative and analytical research in spoken language processing.

Technology Category

Application Category

📝 Abstract

Paralinguistic sounds, like laughter and sighs, are crucial for synthesizing more realistic and engaging speech. However, existing methods typically depend on proprietary datasets, while publicly available resources often suffer from incomplete speech, inaccurate or missing timestamps, and limited real-world relevance. To address these problems, we propose an automated framework for generating large-scale paralinguistic data and apply it to construct the SynParaSpeech dataset. The dataset comprises 6 paralinguistic categories with 118.75 hours of data and precise timestamps, all derived from natural conversational speech. Our contributions lie in introducing the first automated method for constructing large-scale paralinguistic datasets and releasing the SynParaSpeech corpus, which advances speech generation through more natural paralinguistic synthesis and enhances speech understanding by improving paralinguistic event detection. The dataset and audio samples are available at https://github.com/ShawnPi233/SynParaSpeech.

Problem

Research questions and friction points this paper is trying to address.

Automated synthesis of paralinguistic datasets for speech

Addressing incomplete speech and inaccurate timestamps in resources

Improving paralinguistic event detection and natural speech synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for generating paralinguistic datasets

Constructed large-scale dataset with precise timestamps

Derived from natural conversational speech categories

🔎 Similar Papers

No similar papers found.

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

AI Research Scientist - Voice AI Team, Meta Superintelligence Labs