M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic Algorithms, Probabilistic Methods and GPT Models in any Progression and Time Signature

📅 2024-09-19
🏛️ 2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of generating multi-minute, multi-track, structurally controllable MIDI music from natural language descriptions, supporting arbitrary time signatures, keys, and post-hoc editing. The proposed framework comprises three core components: (1) a GPT-based autoregressive Transformer with JSON-structured parameter decoding for precise semantic-to-musical-parameter mapping; (2) a music-semantic-aware genetic algorithm featuring emotion-adaptive evolution and a normally distributed dynamic fitness function to ensure harmonic, melodic, and motivic structural coherence and editability; and (3) a time-signature-agnostic, multi-order Markov model for percussion generation, enhancing rhythmic diversity and metrical robustness. Experiments demonstrate significant improvements over state-of-the-art baselines across objective metrics—including harmonic consistency, rhythmic complexity, and structural coherence—as well as in human evaluations. To our knowledge, this is the first approach enabling high-fidelity, cross-style, cross-time-signature, and interactively editable text-to-MIDI generation.

Technology Category

Application Category

📝 Abstract
This work introduces the M6(GPT)3 composer system, capable of generating complete, multi-minute musical compositions with complex structures in any time signature, in the MIDI domain from input descriptions in natural language. The system utilizes an autoregressive transformer language model to map natural language prompts to composition parameters in JSON format. The defined structure includes time signature, scales, chord progressions, and valence-arousal values, from which accompaniment, melody, bass, motif, and percussion tracks are created. We propose a genetic algorithm for the generation of melodic elements. The algorithm incorporates mutations with musical significance and a fitness function based on normal distribution and predefined musical feature values. The values adaptively evolve, influenced by emotional parameters and distinct playing styles. The system for generating percussion in any time signature utilises probabilistic methods, including Markov chains. Through both human and objective evaluations, we demonstrate that our music generation approach outperforms baselines on specific, musically meaningful metrics, offering a viable alternative to purely neural network-based systems.
Problem

Research questions and friction points this paper is trying to address.

Generating multi-minute MIDI music from text descriptions
Creating modifiable multitrack compositions in any time signature
Combining genetic algorithms with probabilistic methods for music generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive transformer maps text to JSON parameters
Genetic algorithm generates melodies with musical mutations
Probabilistic methods create percussion in any time signature
🔎 Similar Papers
No similar papers found.
J
Jakub Po'cwiardowski
Institute of Computer Science, Warsaw University of Technology
M
Mateusz Modrzejewski
Institute of Computer Science, Warsaw University of Technology
M
Marek S. Tatara
Department of Robotics and Decision Systems, Gdansk University of Technology