MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Prior AI-based music mixing research predominantly focuses on end-to-end audio generation, neglecting collaborative guidance and skill transfer—critical for empowering amateur producers. Method: We introduce MixAssist, the first audio-language multimodal dialogue dataset explicitly designed for collaborative mixing instruction. Constructed from authentic expert–novice co-mixing sessions, it features fine-grained alignment between audio segments and natural-language instructions/feedback. Crucially, we treat audio as the primary modality for dialogue modeling and perform instruction tuning on Qwen-Audio. Results: Our tuned model generates significantly more contextually grounded and actionable mixing suggestions than baselines, as validated by both LLM-as-a-judge evaluation and expert assessment. This work bridges key gaps in AI-supported creative collaboration, contextual instruction understanding, and music education.

Technology Category

Application Category

📝 Abstract

While AI presents significant potential for enhancing music mixing and mastering workflows, current research predominantly emphasizes end-to-end automation or generation, often overlooking the collaborative and instructional dimensions vital for co-creative processes. This gap leaves artists, particularly amateurs seeking to develop expertise, underserved. To bridge this, we introduce MixAssist, a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers during collaborative mixing sessions. Comprising 431 audio-grounded conversational turns derived from 7 in-depth sessions involving 12 producers, MixAssist provides a unique resource for training and evaluating audio-language models that can comprehend and respond to the complexities of real-world music production dialogues. Our evaluations, including automated LLM-as-a-judge assessments and human expert comparisons, demonstrate that fine-tuning models such as Qwen-Audio on MixAssist can yield promising results, with Qwen significantly outperforming other tested models in generating helpful, contextually relevant mixing advice. By focusing on co-creative instruction grounded in audio context, MixAssist enables the development of intelligent AI assistants designed to support and augment the creative process in music mixing.

Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of collaborative AI in music mixing workflows

Providing instructional support for amateur music producers

Enhancing audio-language models for real-world production dialogues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio-language dataset for co-creative music mixing

Multi-turn expert-amateur dialogue capturing

Fine-tuning Qwen-Audio for contextual advice

🔎 Similar Papers

No similar papers found.

Microsoft

$119,800 -

San Francisco Bay area / New York City metropolitan area

Authors to Follow