Efficient Long-Sequence Diffusion Modeling for Symbolic Music Generation

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and the dual challenge of modeling long-range dependencies while preserving fine-grained local details in symbolic music generation. To this end, the authors propose SMDIM, a novel diffusion strategy that integrates structured state space models with a selective local refinement mechanism for the first time. This approach enables efficient construction of global musical structure at near-linear complexity while simultaneously performing lightweight optimization of local details. Experimental results demonstrate that SMDIM significantly outperforms existing methods across multiple symbolic music datasets spanning diverse genres, achieving notable advances in generation quality, inference efficiency, and generalization to unseen musical styles.

Technology Category

Application Category

📝 Abstract
Symbolic music generation is a challenging task in multimedia generation, involving long sequences with hierarchical temporal structures, long-range dependencies, and fine-grained local details. Though recent diffusion-based models produce high quality generations, they tend to suffer from high training and inference costs with long symbolic sequences due to iterative denoising and sequence-length-related costs. To deal with such problem, we put forth a diffusing strategy named SMDIM to combine efficient global structure construction and light local refinement. SMDIM uses structured state space models to capture long range musical context at near linear cost, and selectively refines local musical details via a hybrid refinement scheme. Experiments performed on a wide range of symbolic music datasets which encompass various Western classical music, popular music and traditional folk music show that the SMDIM model outperforms the other state-of-the-art approaches on both the generation quality and the computational efficiency, and it has robust generalization to underexplored musical styles. These results show that SMDIM offers a principled solution for long-sequence symbolic music generation, including associated attributes that accompany the sequences. We provide a project webpage with audio examples and supplementary materials at https://3328702107.github.io/smdim-music/.
Problem

Research questions and friction points this paper is trying to address.

symbolic music generation
long-sequence modeling
diffusion models
computational efficiency
long-range dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
symbolic music generation
state space model
long-sequence modeling
computational efficiency
🔎 Similar Papers
No similar papers found.
J
Jinhan Xu
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070
Xing Tang
Xing Tang
Associate Professor, Shenzhen Technology University
Recommender SystemsData-Centric AIOnline MarketingLarge Language Model
H
Houpeng Yang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070
H
Haoran Zhang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070
S
Shenghua Yuan
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070
J
Jiatao Chen
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070
T
Tianming Xi
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070
J
Jing Wang
School of Computer Science, Hubei University of Technology, Wuhan, China 430068
J
Jiaojiao Yu
School of Information Engineering, Hubei University of Economics, Wuhan, China 430205; Hubei Key Laboratory of Digital Finance Innovation, Wuhan, China 430205
G
Guangli Xiang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China 430070