🤖 AI Summary
Lyric generation faces dual challenges: achieving syllable-level precision while jointly modeling song structure (e.g., verse/chorus). Conventional line-by-line generation often yields semantic discontinuity and prosodic misalignment. This paper proposes the first end-to-end full-song lyric generation framework that unifies song-form awareness with multi-granularity syllabic constraints—operating at the word, phrase, line, and section levels. Methodologically, we design a hierarchical conditional sequence decoder incorporating structure-aware positional encoding, cross-granularity syllable alignment loss, and form-guided attention. Evaluated on a multi-style dataset, our approach achieves an 18.7% improvement in syllable accuracy and a 22.3% gain in structural consistency (F1). Human evaluation by professional lyricists confirms significant gains in naturalness and singability of generated lyrics.
📝 Abstract
Lyrics generation presents unique challenges, particularly in achieving precise syllable control while adhering to song form structures such as verses and choruses. Conventional line-by-line approaches often lead to unnatural phrasing, underscoring the need for more granular syllable management. We propose a framework for lyrics generation that enables multi-level syllable control at the word, phrase, line, and paragraph levels, aware of song form. Our approach generates complete lyrics conditioned on input text and song form, ensuring alignment with specified syllable constraints. Generated lyrics samples are available at: https://tinyurl.com/lyrics9999