🤖 AI Summary
To address the challenge of stylistic modeling in composer-specific symbolic music generation caused by data scarcity, this paper proposes a two-stage paradigm: “general pretraining followed by lightweight adapter-based fine-tuning.” First, an autoregressive REMI language model is pretrained on a large-scale, multi-source corpus encompassing pop, folk, and classical scores to acquire general musical structural knowledge. Subsequently, composer-specific adaptation is performed via parameter-efficient adapters on small, high-quality datasets from four composers—Bach, Mozart, Beethoven, and Chopin. This work represents the first systematic application of the “general-to-specialized” learning paradigm to symbolic music generation, explicitly decoupling general pretraining from stylistic control and revealing hierarchical representations of musical concepts. Experiments demonstrate significant improvements over baselines in both stylistic accuracy and musicality metrics; notably, professional musicians correctly identified 87% of generated excerpts as matching the target composer’s style in subjective evaluation.
📝 Abstract
Despite progress in controllable symbolic music generation, data scarcity remains a challenge for certain control modalities. Composer-style music generation is a prime example, as only a few pieces per composer are available, limiting the modeling of both styles and fundamental music elements (e.g., melody, chord, rhythm). In this paper, we investigate how general music knowledge learned from a broad corpus can enhance the mastery of specific composer styles, with a focus on piano piece generation. Our approach follows a two-stage training paradigm. First, we pre-train a REMI-based music generation model on a large corpus of pop, folk, and classical music. Then, we fine-tune it on a small, human-verified dataset from four renowned composers, namely Bach, Mozart, Beethoven, and Chopin, using a lightweight adapter module to condition the model on style indicators. To evaluate the effectiveness of our approach, we conduct both objective and subjective evaluations on style accuracy and musicality. Experimental results demonstrate that our method outperforms ablations and baselines, achieving more precise composer-style modeling and better musical aesthetics. Additionally, we provide observations on how the model builds music concepts from the generality pre-training and refines its stylistic understanding through the mastery fine-tuning.