The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of jointly modeling singer identity and singing style—particularly dynamic acoustic features such as breathiness, portamento, and vibrato—in singing voice conversion. To this end, we propose the first dual-task evaluation framework specifically designed for singing style conversion. We construct a high-quality, open-source evaluation dataset, develop baseline systems based on deep voice conversion models, and integrate objective metrics with large-scale crowdsourced subjective listening tests. Our key contribution lies in extending singing voice conversion beyond isolated identity transfer to unified style–identity modeling, and in systematically defining and quantifying dynamic stylistic features. Experimental results show that the best-performing system achieves near-natural fidelity in singer identity reconstruction, yet exhibits room for improvement in perceived style naturalness. This work establishes a new benchmark and provides concrete directions for future research in expressive singing voice conversion.

Technology Category

Application Category

📝 Abstract
We present the findings of the latest iteration of the Singing Voice Conversion Challenge, a scientific event aiming to compare and understand different voice conversion systems in a controlled environment. Compared to previous iterations which solely focused on converting the singer identity, this year we also focused on converting the singing style of the singer. To create a controlled environment and thorough evaluations, we developed a new challenge database, introduced two tasks, open-sourced baselines, and conducted large-scale crowd-sourced listening tests and objective evaluations. The challenge was ran for two months and in total we evaluated 26 different systems. The results of the large-scale crowd-sourced listening test showed that top systems had comparable singer identity scores to ground truth samples. However, modeling the singing style and consequently achieving high naturalness still remains a challenge in this task, primarily due to the difficulty in modeling dynamic information in breathy, glissando, and vibrato singing styles.
Problem

Research questions and friction points this paper is trying to address.

Extending voice conversion beyond singer identity to style
Evaluating systems for modeling dynamic singing style attributes
Addressing naturalness challenges in breathy, glissando and vibrato styles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed new challenge database for controlled evaluations
Introduced two tasks with open-sourced baseline systems
Conducted large-scale crowd-sourced listening and objective tests
🔎 Similar Papers
No similar papers found.