ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI music generation models suffer from weak harmonic-melodic structural coherence and limited musical interpretability. To address this, we propose a structured generative framework integrating Schenkerian analysis with graph diffusion modeling. First, we design a phrase-fusion mechanism grounded in Schenkerian theory to explicitly model tonal hierarchical structure. Second, we extend the discrete graph diffusion model (DiGress) by incorporating hierarchical symbolic music representations and theory-informed probabilistic modeling. Third, we introduce a user-controllable generation interface enabling structural-level interventions—such as harmonic progressions and tonal planning. Listening evaluations demonstrate that our method significantly outperforms state-of-the-art models in structural coherence, musicality, and controllability. To our knowledge, this is the first AI composition paradigm that simultaneously achieves theoretical interpretability, structural editability, and high-fidelity generation.

Technology Category

Application Category

📝 Abstract
Artificial Intelligence (AI) for music generation is undergoing rapid developments, with recent symbolic models leveraging sophisticated deep learning and diffusion model algorithms. One drawback with existing models is that they lack structural cohesion, particularly on harmonic-melodic structure. Furthermore, such existing models are largely "black-box" in nature and are not musically interpretable. This paper addresses these limitations via a novel generative music framework that incorporates concepts of Schenkerian analysis (SchA) in concert with a diffusion modeling framework. This framework, which we call ProGress (Prolongation-enhanced DiGress), adapts state-of-the-art deep models for discrete diffusion (in particular, the DiGress model of Vignac et al., 2023) for interpretable and structured music generation. Concretely, our contributions include 1) novel adaptations of the DiGress model for music generation, 2) a novel SchA-inspired phrase fusion methodology, and 3) a framework allowing users to control various aspects of the generation process to create coherent musical compositions. Results from human experiments suggest superior performance to existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of structural cohesion in music generation
Solves black-box nature of existing AI music models
Enhances interpretability and user control in composition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted DiGress model for structured music generation
Introduced Schenkerian analysis-inspired phrase fusion method
Developed user-controllable framework for coherent compositions
🔎 Similar Papers
No similar papers found.