MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Controlling multi-track music generation with fine-grained, attribute-specific expressiveness remains challenging due to inter-track interference and limited controllability. Method: We propose a Transformer-based conditional generative model that (1) represents each track as a separate temporal event sequence to avoid cross-track interleaving; (2) incorporates expressive variant embeddings encoding instrument type, musical style, note density, and note duration for multi-attribute control; and (3) employs multi-attribute conditional embedding and deduplication-aware training to ensure stylistic consistency and compositional originality. Results: Experiments show zero training-data memorization, high style fidelity, and 92% accuracy in attribute control. The model supports precise track-level and bar-level completion. Deployed in industry collaborations and professional music production, this work establishes a scalable technical framework for controllable AI composition.

Technology Category

Application Category

📝 Abstract

We present and release MIDI-GPT, a generative system based on the Transformer architecture that is designed for computer-assisted music composition workflows. MIDI-GPT supports the infilling of musical material at the track and bar level, and can condition generation on attributes including: instrument type, musical style, note density, polyphony level, and note duration. In order to integrate these features, we employ an alternative representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We also propose a variation of our representation allowing for expressiveness. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material. We also outline several real-world applications of MIDI-GPT, including collaborations with industry partners that explore the integration and evaluation of MIDI-GPT into commercial products, as well as several artistic works produced using it.

Problem

Research questions and friction points this paper is trying to address.

AI-generated music

customizable parameters

creative and commercial applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

MIDI-GPT

music representation

user-controlled composition

🔎 Similar Papers

M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic Algorithms, Probabilistic Methods and GPT Models in any Progression and Time Signature