🤖 AI Summary
Current AI-based music generation methods suffer from significant limitations in duration, audio quality, and controllability. To address these challenges, we propose CoComposer—the first large language model (LLM)-driven multi-agent collaborative composition system that emulates human compositional workflows. It comprises five specialized agents jointly responsible for melody generation, harmonic progression, structural planning, orchestration, and refinement. By introducing the multi-agent paradigm to music generation, CoComposer achieves long-horizon modeling, structural coherence, and fine-grained editability. We evaluate the system using state-of-the-art LLMs—including GPT-4o, DeepSeek-V3-0324, and Gemini-2.5-Flash—and employ AudioBox-Aesthetics for objective audio quality assessment. Experimental results demonstrate that CoComposer consistently outperforms both existing LLM-based single-agent and multi-agent approaches across musical quality, structural complexity, and controllability. Moreover, it significantly enhances interpretability and post-generation editability of generated music.
📝 Abstract
Existing AI Music composition tools are limited in generation duration, musical quality, and controllability. We introduce CoComposer, a multi-agent system that consists of five collaborating agents, each with a task based on the traditional music composition workflow. Using the AudioBox-Aesthetics system, we experimentally evaluate CoComposer on four compositional criteria. We test with three LLMs (GPT-4o, DeepSeek-V3-0324, Gemini-2.5-Flash), and find (1) that CoComposer outperforms existing multi-agent LLM-based systems in music quality, and (2) compared to a single-agent system, in production complexity. Compared to non- LLM MusicLM, CoComposer has better interpretability and editability, although MusicLM still produces better music.