TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of real-world annotated data for multimodal conversational music recommendation, this paper proposes a multi-agent collaborative synthetic data generation framework. The method employs role-playing large language model (LLM) agents to simulate diverse user-system interactions, integrating fine-grained dialogue goal control, cooperative multimodal large models (supporting both audio and image modalities), and role-aware prompt engineering to produce high-fidelity, scenario-diverse conversational recommendation data. To ensure data quality, a dual validation mechanism—combining automated LLM-based evaluation with human subjective assessment—is introduced. Experimental results demonstrate that the synthesized data significantly improves the performance of generative music recommendation models. The source code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
We present TalkPlayData 2, a synthetic dataset for multimodal conversational music recommendation generated by an agentic data pipeline. In TalkPlayData 2 pipeline, multiple large language model (LLM) agents are created under various roles with specialized prompts and access to different parts of information, and the chat data is acquired by logging the conversation between the Listener LLM and the Recsys LLM. To cover various conversation scenarios, for each conversation, the Listener LLM is conditioned on a finetuned conversation goal. Finally, all the LLMs are multimodal with audio and images, allowing a simulation of multimodal recommendation and conversation. In the LLM-as-a-judge and subjective evaluation experiments, TalkPlayData 2 achieved the proposed goal in various aspects related to training a generative recommendation model for music. TalkPlayData 2 and its generation code are open-sourced at https://talkpl.ai/talkplaydata2.html.
Problem

Research questions and friction points this paper is trying to address.

Generating synthetic multimodal conversational music recommendation data
Simulating diverse conversation scenarios using specialized LLM agents
Creating training data for generative music recommendation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic pipeline with multiple LLM agents
Multimodal LLMs simulate audio and image conversations
Synthetic dataset for music recommendation training
🔎 Similar Papers
No similar papers found.