M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic Algorithms, Probabilistic Methods and GPT Models in any Progression and Time Signature

📅 2024-09-19

🏛️ 2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

📈 Citations: 1

✨ Influential: 0

career value

278K/year

🤖 AI Summary

This work addresses the problem of generating multi-minute, multi-track, structurally controllable MIDI music from natural language descriptions, supporting arbitrary time signatures, keys, and post-hoc editing. The proposed framework comprises three core components: (1) a GPT-based autoregressive Transformer with JSON-structured parameter decoding for precise semantic-to-musical-parameter mapping; (2) a music-semantic-aware genetic algorithm featuring emotion-adaptive evolution and a normally distributed dynamic fitness function to ensure harmonic, melodic, and motivic structural coherence and editability; and (3) a time-signature-agnostic, multi-order Markov model for percussion generation, enhancing rhythmic diversity and metrical robustness. Experiments demonstrate significant improvements over state-of-the-art baselines across objective metrics—including harmonic consistency, rhythmic complexity, and structural coherence—as well as in human evaluations. To our knowledge, this is the first approach enabling high-fidelity, cross-style, cross-time-signature, and interactively editable text-to-MIDI generation.

Technology Category

Application Category

📝 Abstract

This work introduces the M6(GPT)3 composer system, capable of generating complete, multi-minute musical compositions with complex structures in any time signature, in the MIDI domain from input descriptions in natural language. The system utilizes an autoregressive transformer language model to map natural language prompts to composition parameters in JSON format. The defined structure includes time signature, scales, chord progressions, and valence-arousal values, from which accompaniment, melody, bass, motif, and percussion tracks are created. We propose a genetic algorithm for the generation of melodic elements. The algorithm incorporates mutations with musical significance and a fitness function based on normal distribution and predefined musical feature values. The values adaptively evolve, influenced by emotional parameters and distinct playing styles. The system for generating percussion in any time signature utilises probabilistic methods, including Markov chains. Through both human and objective evaluations, we demonstrate that our music generation approach outperforms baselines on specific, musically meaningful metrics, offering a viable alternative to purely neural network-based systems.

Problem

Research questions and friction points this paper is trying to address.

Generating multi-minute MIDI music from text descriptions

Creating modifiable multitrack compositions in any time signature

Combining genetic algorithms with probabilistic methods for music generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive transformer maps text to JSON parameters

Genetic algorithm generates melodies with musical mutations

Probabilistic methods create percussion in any time signature

🔎 Similar Papers

No similar papers found.