SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the imbalance between structural control and multi-track orchestration complexity in symphonic music generation by proposing a three-dimensional hierarchical generative framework. Through a bar–voice–event tri-level decoupled modeling approach, the method enables fine-grained, controllable long-sequence generation. Key innovations include a beat-quantized harmonic skeleton guidance mechanism, a three-dimensional cascaded decoding architecture, a Group Relative Policy Optimization (GRPO) reinforcement learning strategy, and a dissonance-avoiding sampling algorithm, all optimized with a cross-modal audio-perception reward. Experimental results demonstrate significant improvements in harmonic clarity on objective metrics, while subjective evaluations show superior musicality and user preference compared to existing baselines.

📝 Abstract

Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations. Additionally, we implement a dissonance-averse sampling algorithm to suppress unintended tonal clashes during inference. Objective evaluations show that both reinforcement learning and dissonance-averse sampling effectively enhance harmonic cleanliness while maintaining melodic expression. Subjective evaluations demonstrate that SymphonyGen outperforms baselines in musicality and preference for orchestral music generation. Demo page: https://symphonygen.github.io/

Problem

Research questions and friction points this paper is trying to address.

symphonic music generation

complexity-control imbalance

orchestration

hierarchical structure

harmonic control

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D hierarchical generation

harmony skeleton conditioning

Group Relative Policy Optimization (GRPO)