YingMusic-Singer: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing lyric editing methods suffer from insufficient controllability in preserving melodic consistency or rely heavily on manual alignment. This work proposes a fully diffusion-based model that enables melody-controllable singing voice synthesis without requiring manual alignment, using only an optional timbre reference, the original vocal snippet, and the edited lyrics. The method is the first to support flexible lyric editing while precisely retaining the original melody. Furthermore, we introduce LyricEditBench, the first benchmark specifically designed for evaluating melody-preserving lyric editing. By integrating a full diffusion architecture with curriculum learning and a grouped relative strategy, our approach outperforms the current strongest baseline, Vevo2, in both melodic fidelity and lyrical alignment. Code, models, the evaluation benchmark, and audio samples are publicly released.

Technology Category

Application Category

📝 Abstract

Regenerating singing voices with altered lyrics while preserving melody consistency remains challenging, as existing methods either offer limited controllability or require laborious manual alignment. We propose YingMusic-Singer, a fully diffusion-based model enabling melody-controllable singing voice synthesis with flexible lyric manipulation. The model takes three inputs: an optional timbre reference, a melody-providing singing clip, and modified lyrics, without manual alignment. Trained with curriculum learning and Group Relative Policy Optimization, YingMusic-Singer achieves stronger melody preservation and lyric adherence than Vevo2, the most comparable baseline supporting melody control without manual alignment. We also introduce LyricEditBench, the first benchmark for melody-preserving lyric modification evaluation. The code, weights, benchmark, and demos are publicly available at https://github.com/ASLP-lab/YingMusic-Singer.

Problem

Research questions and friction points this paper is trying to address.

singing voice synthesis

lyric manipulation

melody preservation

controllable generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based singing voice synthesis

melody-preserving lyric editing

annotation-free melody guidance

curriculum learning