Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work proposes a controllable singing voice conversion system to address key challenges in the field, including style leakage, unstable dynamic expression, and difficulty achieving high-fidelity synthesis under limited data. The approach incorporates a boundary-aware information bottleneck to suppress residual source-style artifacts, introduces an explicit frame-level technique matrix to enhance dynamic style rendering, and integrates targeted F0 processing with a perception-based high-frequency band completion mechanism to mitigate data scarcity. Evaluated in the SVCC2025 subjective assessment, the system achieves top-ranking naturalness, demonstrates strong speaker similarity and precise control over vocal techniques, and accomplishes these results using significantly less additional singing data than current state-of-the-art systems.

Technology Category

Application Category

📝 Abstract

This paper presents the submission of the S4 team to the Singing Voice Conversion Challenge 2025 (SVCC2025)-a novel singing style conversion system that advances fine-grained style conversion and control within in-domain settings. To address the critical challenges of style leakage, dynamic rendering, and high-fidelity generation with limited data, we introduce three key innovations: a boundary-aware Whisper bottleneck that pools phoneme-span representations to suppress residual source style while preserving linguistic content; an explicit frame-level technique matrix, enhanced by targeted F0 processing during inference, for stable and distinct dynamic style rendering; and a perceptually motivated high-frequency band completion strategy that leverages an auxiliary standard 48kHz SVC model to augment the high-frequency spectrum, thereby overcoming data scarcity without overfitting. In the official SVCC2025 subjective evaluation, our system achieves the best naturalness performance among all submissions while maintaining competitive results in speaker similarity and technique control, despite using significantly less extra singing data than other top-performing systems. Audio samples are available online.

Problem

Research questions and friction points this paper is trying to address.

singing style conversion

style leakage

dynamic rendering

high-fidelity generation

limited data

Innovation

Methods, ideas, or system contributions that make the work stand out.

boundary-aware information bottleneck

frame-level technique matrix

high-frequency band completion