TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the scarcity of real-world annotated data for multimodal conversational music recommendation, this paper proposes a multi-agent collaborative synthetic data generation framework. The method employs role-playing large language model (LLM) agents to simulate diverse user-system interactions, integrating fine-grained dialogue goal control, cooperative multimodal large models (supporting both audio and image modalities), and role-aware prompt engineering to produce high-fidelity, scenario-diverse conversational recommendation data. To ensure data quality, a dual validation mechanism—combining automated LLM-based evaluation with human subjective assessment—is introduced. Experimental results demonstrate that the synthesized data significantly improves the performance of generative music recommendation models. The source code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

We present TalkPlayData 2, a synthetic dataset for multimodal conversational music recommendation generated by an agentic data pipeline. In TalkPlayData 2 pipeline, multiple large language model (LLM) agents are created under various roles with specialized prompts and access to different parts of information, and the chat data is acquired by logging the conversation between the Listener LLM and the Recsys LLM. To cover various conversation scenarios, for each conversation, the Listener LLM is conditioned on a finetuned conversation goal. Finally, all the LLMs are multimodal with audio and images, allowing a simulation of multimodal recommendation and conversation. In the LLM-as-a-judge and subjective evaluation experiments, TalkPlayData 2 achieved the proposed goal in various aspects related to training a generative recommendation model for music. TalkPlayData 2 and its generation code are open-sourced at https://talkpl.ai/talkplaydata2.html.

Problem

Research questions and friction points this paper is trying to address.

Generating synthetic multimodal conversational music recommendation data

Simulating diverse conversation scenarios using specialized LLM agents

Creating training data for generative music recommendation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic pipeline with multiple LLM agents

Multimodal LLMs simulate audio and image conversations

Synthetic dataset for music recommendation training

🔎 Similar Papers

No similar papers found.