Active Learning for Text-to-Speech Synthesis with Informative Sample Collection

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the escalating storage and annotation costs associated with rapidly growing TTS datasets, this paper proposes the first active learning framework for corpus construction in text-to-speech synthesis. Departing from conventional static, model-agnostic data collection paradigms, our approach establishes a closed-loop “sampling–modeling–feedback” pipeline: it dynamically selects high-informativeness text–speech pairs by jointly evaluating model uncertainty and sample diversity, followed by incremental model retraining. Experiments demonstrate that corpora constructed via our method significantly improve synthesized speech naturalness—yielding MOS gains of +0.3 to +0.5 under identical data scale—while achieving full baseline performance using only 60% of the data. This enables substantially more efficient and higher-quality data utilization for TTS development.

Technology Category

Application Category

📝 Abstract

The construction of high-quality datasets is a cornerstone of modern text-to-speech (TTS) systems. However, the increasing scale of available data poses significant challenges, including storage constraints. To address these issues, we propose a TTS corpus construction method based on active learning. Unlike traditional feed-forward and model-agnostic corpus construction approaches, our method iteratively alternates between data collection and model training, thereby focusing on acquiring data that is more informative for model improvement. This approach enables the construction of a data-efficient corpus. Experimental results demonstrate that the corpus constructed using our method enables higher-quality speech synthesis than corpora of the same size.

Problem

Research questions and friction points this paper is trying to address.

Reducing storage constraints in TTS dataset construction

Improving data efficiency for speech synthesis models

Enhancing TTS quality through informative sample collection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning for TTS corpus construction

Iterative data collection and model training

Focus on informative data for efficiency

🔎 Similar Papers

No similar papers found.