NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the longstanding challenges of low notational quality and limited controllability in symbolic music generation. We pioneer the systematic adaptation of the large language model (LLM) paradigm—pretraining, fine-tuning, and reinforcement learning—to this domain. Specifically, we propose CLaMP-DPO, a reward-free reinforcement learning method that jointly optimizes musical aesthetics and conditional controllability without human-annotated rewards. Our approach leverages a 1.6-million-patch pretraining corpus and fine-tunes on 9K high-fidelity classical scores, validated across multiple encodings and architectural variants. Subjective A/B evaluations demonstrate that our generated scores significantly outperform state-of-the-art baselines, achieving near-human-level performance in structural coherence, stylistic consistency, and artistic expressivity. This advances both the aesthetic fidelity and practical utility of symbolic music generation.

Technology Category

Application Category

📝 Abstract

We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music, and then fine-tuned on approximately 9K high-quality classical compositions conditioned on"period-composer-instrumentation"prompts. For reinforcement learning, we propose the CLaMP-DPO method, which further enhances generation quality and controllability without requiring human annotations or predefined rewards. Our experiments demonstrate the efficacy of CLaMP-DPO in symbolic music generation models with different architectures and encoding schemes. Furthermore, subjective A/B tests show that NotaGen outperforms baseline models against human compositions, greatly advancing musical aesthetics in symbolic music generation.The project homepage is https://electricalexis.github.io/notagen-demo.

Problem

Research questions and friction points this paper is trying to address.

Enhancing musicality in symbolic music generation

Implementing LLM training paradigms for music

Improving controllability without human annotations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adopts LLM training paradigms

Utilizes CLaMP-DPO method

Pre-trained on extensive music dataset

🔎 Similar Papers

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition