🤖 AI Summary
Prior AI-based music mixing research predominantly focuses on end-to-end audio generation, neglecting collaborative guidance and skill transfer—critical for empowering amateur producers. Method: We introduce MixAssist, the first audio-language multimodal dialogue dataset explicitly designed for collaborative mixing instruction. Constructed from authentic expert–novice co-mixing sessions, it features fine-grained alignment between audio segments and natural-language instructions/feedback. Crucially, we treat audio as the primary modality for dialogue modeling and perform instruction tuning on Qwen-Audio. Results: Our tuned model generates significantly more contextually grounded and actionable mixing suggestions than baselines, as validated by both LLM-as-a-judge evaluation and expert assessment. This work bridges key gaps in AI-supported creative collaboration, contextual instruction understanding, and music education.
📝 Abstract
While AI presents significant potential for enhancing music mixing and mastering workflows, current research predominantly emphasizes end-to-end automation or generation, often overlooking the collaborative and instructional dimensions vital for co-creative processes. This gap leaves artists, particularly amateurs seeking to develop expertise, underserved. To bridge this, we introduce MixAssist, a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers during collaborative mixing sessions. Comprising 431 audio-grounded conversational turns derived from 7 in-depth sessions involving 12 producers, MixAssist provides a unique resource for training and evaluating audio-language models that can comprehend and respond to the complexities of real-world music production dialogues. Our evaluations, including automated LLM-as-a-judge assessments and human expert comparisons, demonstrate that fine-tuning models such as Qwen-Audio on MixAssist can yield promising results, with Qwen significantly outperforming other tested models in generating helpful, contextually relevant mixing advice. By focusing on co-creative instruction grounded in audio context, MixAssist enables the development of intelligent AI assistants designed to support and augment the creative process in music mixing.