From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conversational recommendation systems (CRS) face significant challenges in low-resource settings, including scarcity of domain-specific dialogue data, high annotation costs, and privacy constraints. To address these, we propose the first active learning–driven dialogue synthesis framework tailored for black-box large language models (LLMs), requiring neither fine-tuning nor access to internal LLM parameters. Leveraging prompt engineering, our method integrates heterogeneous non-dialogue data—such as item metadata, user reviews, and collaborative signals—to enable high-informativeness seed selection and semantic-consistent, structured dialogue generation. Empirically, the approach substantially improves zero-shot LLM-based CRS performance in sparse-data regimes, enhancing both recommendation accuracy and dialogue coherence. Moreover, it enables lightweight supervised models trained on synthesized data to approach the performance of fully supervised baselines. This work constitutes the first demonstration of high-quality CRS construction without any human-annotated dialogue data.

Technology Category

Application Category

📝 Abstract
Conversational recommender systems (CRS) typically require extensive domain-specific conversational datasets, yet high costs, privacy concerns, and data-collection challenges severely limit their availability. Although Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities, practical applications often favor smaller, internally managed recommender models due to scalability, interpretability, and data privacy constraints, especially in sensitive or rapidly evolving domains. However, training these smaller models effectively still demands substantial domain-specific conversational data, which remains challenging to obtain. To address these limitations, we propose an active data augmentation framework that synthesizes conversational training data by leveraging black-box LLMs guided by active learning techniques. Specifically, our method utilizes publicly available non-conversational domain data, including item metadata, user reviews, and collaborative signals, as seed inputs. By employing active learning strategies to select the most informative seed samples, our approach efficiently guides LLMs to generate synthetic, semantically coherent conversational interactions tailored explicitly to the target domain. Extensive experiments validate that conversational data generated by our proposed framework significantly improves the performance of LLM-based CRS models, effectively addressing the challenges of building CRS in no- or low-resource scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addressing lack of domain-specific conversational data for CRS
Enabling zero-shot LLM-based CRS with scalable data synthesis
Improving small CRS models via active learning-guided LLM augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning guides LLM data synthesis
Leverages non-conversational data as seeds
Generates domain-specific synthetic dialogues
🔎 Similar Papers
No similar papers found.