Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing singing synthesis systems are constrained by global timbre control, making it difficult to model dynamic multi-singer arrangements and rich acoustic textures within a single piece. This work proposes Tutti, a unified framework that enables flexible singer scheduling aligned with musical structure through a structure-aware singer prompting mechanism. Furthermore, it introduces a condition-guided variational autoencoder (VAE) to learn complementary acoustic textures by jointly leveraging explicit and implicit acoustic features. By moving beyond the limitations of conventional global timbre settings, the proposed method significantly outperforms existing approaches in both multi-singer scheduling accuracy and the acoustic realism of choral synthesis, establishing a new paradigm for complex polyphonic singing synthesis.

Technology Category

Application Category

📝 Abstract

While existing Singing Voice Synthesis systems achieve high-fidelity solo performances, they are constrained by global timbre control, failing to address dynamic multi-singer arrangement and vocal texture within a single song. To address this, we propose Tutti, a unified framework designed for structured multi-singer generation. Specifically, we introduce a Structure-Aware Singer Prompt to enable flexible singer scheduling evolving with musical structure, and propose Complementary Texture Learning via Condition-Guided VAE to capture implicit acoustic textures (e.g., spatial reverberation and spectral fusion) that are complementary to explicit controls. Experiments demonstrate that Tutti excels in precise multi-singer scheduling and significantly enhances the acoustic realism of choral generation, offering a novel paradigm for complex multi-singer arrangement. Audio samples are available at https://annoauth123-ctrl.github.io/Tutii_Demo/.

Problem

Research questions and friction points this paper is trying to address.

multi-singer synthesis

timbre control

vocal texture

singing voice synthesis

choral generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-singer synthesis

structure-aware singer prompt

complementary texture learning

condition-guided VAE

vocal texture modeling

🔎 Similar Papers

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

2024-09-24Conference on Empirical Methods in Natural Language ProcessingCitations: 0