SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

📅 2024-02-27

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing symbolic music generation research primarily addresses isolated subtasks—such as lyric generation or melody transformation—lacking end-to-end frameworks that jointly model lyrics and melody. This paper proposes the first instruction-driven, lyric-melody co-generation model. Our method introduces a word-level aligned tuple representation, initializes a music-knowledge-guided note tokenizer, and models melody structure in three hierarchical stages: motive → phrase → section. We employ a music-specialized large language model, an expanded note vocabulary, and rhythm-aware scalar initialization. Evaluated across lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song synthesis, our model consistently outperforms GPT-4. To foster reproducibility and further research, we publicly release SongCompose—a bilingual (Chinese–English) paired dataset of lyrics and melodies.

Technology Category

Application Category

📝 Abstract

Creating lyrics and melodies for the vocal track in a symbolic format, known as song composition, demands expert musical knowledge of melody, an advanced understanding of lyrics, and precise alignment between them. Despite achievements in sub-tasks such as lyric generation, lyric-to-melody, and melody-to-lyric, etc, a unified model for song composition has not yet been achieved. In this paper, we introduce SongComposer, a pioneering step towards a unified song composition model that can readily create symbolic lyrics and melodies following instructions. SongComposer is a music-specialized large language model (LLM) that, for the first time, integrates the capability of simultaneously composing lyrics and melodies into LLMs by leveraging three key innovations: 1) a flexible tuple format for word-level alignment of lyrics and melodies, 2) an extended tokenizer vocabulary for song notes, with scalar initialization based on musical knowledge to capture rhythm, and 3) a multi-stage pipeline that captures musical structure, starting with motif-level melody patterns and progressing to phrase-level structure for improved coherence. Extensive experiments demonstrate that SongComposer outperforms advanced LLMs, including GPT-4, in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation. Moreover, we will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.

Problem

Research questions and friction points this paper is trying to address.

Lack of unified model for simultaneous lyric and melody composition

Challenges in aligning lyrics and melodies at word-level

Need for music-specialized LLM to improve song generation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible tuple format for lyric-melody alignment

Extended tokenizer vocabulary for song notes

Multi-stage pipeline capturing musical structure

🔎 Similar Papers

No similar papers found.