SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Symbolic piano music generation lacks scalable, subjective evaluation signals for training. Method: This work proposes the first end-to-end reinforcement learning (RL) fine-tuning of a symbolic music generation model guided by audio-domain aesthetic scores—specifically, Meta Audiobox Aesthetics scores computed on rendered audio—and optimized via Group Relative Policy Optimization (GRPO). Contribution/Results: The fine-tuned model achieves significant improvements in low-level musical feature validity and mean subjective listening ratings (N=14). However, excessive optimization leads to a marked decline in generation diversity, revealing an inherent trade-off between aesthetic quality and diversity in aesthetic-guided fine-tuning. This study establishes a novel cross-modal paradigm for symbolic music generation driven by aesthetic feedback from rendered audio, providing empirical insights into the design and limitations of human-aligned RL objectives in generative music systems.

Technology Category

Application Category

📝 Abstract
Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs.
Problem

Research questions and friction points this paper is trying to address.

Finetune symbolic music generation using audio aesthetic rewards
Assess impact on output features and subjective ratings
Investigate over-optimization effects on output diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic music generation with reinforcement learning
Audio domain aesthetic reward for tuning
Group relative policy optimization technique
🔎 Similar Papers
No similar papers found.