Unifying Multitrack Music Arrangement via Reconstruction Fine-Tuning and Efficient Tokenization

๐Ÿ“… 2024-08-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing automatic music orchestration methods suffer from three key bottlenecks: inefficient tokenization, underutilization of pre-trained large music models, and poor fidelity and coherence in generated outputs. To address these, we propose a unified sequence-to-sequence reconstruction fine-tuning framework that jointly optimizes reconstruction learning with conditional and unconditional orchestration tasksโ€”marking the first such integration. We design an efficient multi-track joint tokenizer that ensures temporal alignment while achieving representation compression. Our approach further integrates reconstruction-based fine-tuning, long-range structural modeling, and conditional guidance mechanisms. Evaluated on orchestral arrangement, piano reduction, and drum pattern generation, our method achieves state-of-the-art performance across all three tasks. Both objective metrics and human listening evaluations demonstrate consistent superiority over task-specific baselines, significantly improving generation quality, structural consistency, and coherence for long-form, polyphonic music.

Technology Category

Application Category

๐Ÿ“ Abstract
Automatic music arrangement streamlines the creation of musical variants for composers and arrangers, reducing reliance on extensive music expertise. However, existing methods suffer from inefficient tokenization, underutilization of pre-trained music language models (LMs), and suboptimal fidelity and coherence in generated arrangements. This paper introduces an efficient multitrack music tokenizer for unconditional and conditional symbolic music generation, along with a unified sequence-to-sequence reconstruction fine-tuning objective for pre-trained music LMs that balances task-specific needs with coherence constraints. Our approach achieves state-of-the-art results on band arrangement, piano reduction, and drum arrangement, surpassing task-specific models in both objective metrics and perceptual quality. Additionally, we demonstrate that generative pretraining significantly contributes to the performance across these arrangement tasks, especially when handling long segments with complex alignment.
Problem

Research questions and friction points this paper is trying to address.

Inefficient tokenization in music arrangement methods
Underutilization of pre-trained music language models
Suboptimal fidelity and coherence in generated arrangements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient multitrack music tokenizer for generation
Unified sequence-to-sequence reconstruction fine-tuning objective
Generative pretraining enhances arrangement task performance
๐Ÿ”Ž Similar Papers
No similar papers found.
Longshen Ou
Longshen Ou
National University of Singapore
Music Information RetrievalAudio ProcessingNatural Language Processing
J
Jingwei Zhao
Sound and Music Computing Lab, School of Computing, NUS; Institute of Data Science, NUS; Integrative Sciences and Engineering Programme, NUS Graduate School
Z
Ziyu Wang
Music X Lab, MBZUAI; NYU Shanghai
G
Gus G. Xia
Music X Lab, MBZUAI; NYU Shanghai
Y
Ye Wang
Sound and Music Computing Lab, School of Computing, NUS; Institute of Data Science, NUS; Integrative Sciences and Engineering Programme, NUS Graduate School