SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of long-form music generation, which include modeling long-range dependencies and high computational costs. The authors propose a temporal acceleration–deceleration strategy: first generating temporally compressed music sequences (e.g., at 2×–8× speed) efficiently in an accelerated domain using a diffusion model, then restoring the output to its original tempo followed by refinement. This approach reframes long-sequence generation as a hierarchical task and can be seamlessly integrated into existing models. Experimental results demonstrate significant improvements in efficiency and scalability—particularly for full-song accompaniment generation—while preserving high audio quality.

Technology Category

Application Category

📝 Abstract
Composing coherent long-form music remains a significant challenge due to the complexity of modeling long-range dependencies and the prohibitive memory and computational requirements associated with lengthy audio representations. In this work, we propose a simple yet powerful trick: we assume that AI models can understand and generate time-accelerated (speeded-up) audio at rates such as 2x, 4x, or even 8x. By first generating a high-speed version of the music, we greatly reduce the temporal length and resource requirements, making it feasible to handle long-form music that would otherwise exceed memory or computational limits. The generated audio is then restored to its original speed, recovering the full temporal structure. This temporal speed-up and slow-down strategy naturally follows the principle of hierarchical generation from abstract to detailed content, and can be conveniently applied to existing music generation models to enable long-form music generation. We instantiate this idea in SqueezeComposer, a framework that employs diffusion models for generation in the accelerated domain and refinement in the restored domain. We validate the effectiveness of this approach on two tasks: long-form music generation, which evaluates temporal-wise control (including continuation, completion, and generation from scratch), and whole-song singing accompaniment generation, which evaluates track-wise control. Experimental results demonstrate that our simple temporal speed-up trick enables efficient, scalable, and high-quality long-form music generation. Audio samples are available at https://SqueezeComposer.github.io/.
Problem

Research questions and friction points this paper is trying to address.

long-form music generation
long-range dependencies
computational efficiency
temporal modeling
music coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal speed-up
long-form music generation
diffusion models
hierarchical generation
audio acceleration
🔎 Similar Papers
No similar papers found.
J
Jianyi Chen
The Hong Kong University of Science and Technology
R
Rongxiu Zhong
JIUTIAN Research of China Mobile
S
Shilei Zhang
JIUTIAN Research of China Mobile
K
Kun Qian
Beijing Institute of Technology
J
Jinglei Liu
China Mobile (Hong Kong) Innovation Research Institute
Y
Yike Guo
The Hong Kong University of Science and Technology
Wei Xue
Wei Xue
Department of Applied Plant Science, Chonnam National University
Crop ecophysiology modellingclimate change