๐ค AI Summary
High-quality goal-oriented dialogue data for e-commerce conversational systems is scarce, and existing models struggle to accurately comprehend user intent and efficiently retrieve relevant products. Method: This paper proposes a novel dialogue planning framework integrating large language models (LLMs) with decision-tree-guided reasoning. It introduces a decision-tree-based mechanism that predicts optimal search paths, enabling precise product localization via minimal conditional chains. Additionally, we construct WoSโthe first high-quality, goal-oriented e-commerce dialogue dataset (3.6K dialogues)โgenerated by LLMs augmented with domain-specific knowledge injection and rigorous human verification to ensure both naturalness and domain expertise. Contribution/Results: Experiments demonstrate that WoS substantially outperforms baseline datasets: it improves accuracy by 12.7% in both dialogue policy learning and product retrieval tasks, validating its efficacy for training robust, intent-aware e-commerce dialogue agents.
๐ Abstract
The goal of conversational product search (CPS) is to develop an intelligent, chat-based shopping assistant that can directly interact with customers to understand shopping intents, ask clarification questions, and find relevant products. However, training such assistants is hindered mainly due to the lack of reliable and large-scale datasets. Prior human-annotated CPS datasets are extremely small in size and lack integration with real-world product search systems. We propose a novel approach, TRACER, which leverages large language models (LLMs) to generate realistic and natural conversations for different shopping domains. TRACER's novelty lies in grounding the generation to dialogue plans, which are product search trajectories predicted from a decision tree model, that guarantees relevant product discovery in the shortest number of search conditions. We also release the first target-oriented CPS dataset Wizard of Shopping (WoS), containing highly natural and coherent conversations (3.6k) from three shopping domains. Finally, we demonstrate the quality and effectiveness of WoS via human evaluations and downstream tasks.