Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing singing synthesis systems are constrained by global timbre control, making it difficult to model dynamic multi-singer arrangements and rich acoustic textures within a single piece. This work proposes Tutti, a unified framework that enables flexible singer scheduling aligned with musical structure through a structure-aware singer prompting mechanism. Furthermore, it introduces a condition-guided variational autoencoder (VAE) to learn complementary acoustic textures by jointly leveraging explicit and implicit acoustic features. By moving beyond the limitations of conventional global timbre settings, the proposed method significantly outperforms existing approaches in both multi-singer scheduling accuracy and the acoustic realism of choral synthesis, establishing a new paradigm for complex polyphonic singing synthesis.

Technology Category

Application Category

📝 Abstract
While existing Singing Voice Synthesis systems achieve high-fidelity solo performances, they are constrained by global timbre control, failing to address dynamic multi-singer arrangement and vocal texture within a single song. To address this, we propose Tutti, a unified framework designed for structured multi-singer generation. Specifically, we introduce a Structure-Aware Singer Prompt to enable flexible singer scheduling evolving with musical structure, and propose Complementary Texture Learning via Condition-Guided VAE to capture implicit acoustic textures (e.g., spatial reverberation and spectral fusion) that are complementary to explicit controls. Experiments demonstrate that Tutti excels in precise multi-singer scheduling and significantly enhances the acoustic realism of choral generation, offering a novel paradigm for complex multi-singer arrangement. Audio samples are available at https://annoauth123-ctrl.github.io/Tutii_Demo/.
Problem

Research questions and friction points this paper is trying to address.

multi-singer synthesis
timbre control
vocal texture
singing voice synthesis
choral generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-singer synthesis
structure-aware singer prompt
complementary texture learning
condition-guided VAE
vocal texture modeling
🔎 Similar Papers
2024-09-24Conference on Empirical Methods in Natural Language ProcessingCitations: 0