🤖 AI Summary
This work proposes an end-to-end generative framework for conversational recommendation that overcomes the limitations of existing approaches, which often decouple recommendation from dialogue generation or rely on retrieval mechanisms, thereby failing to deeply integrate user intent. The proposed method unifies item recommendation and natural language response within a single autoregressive model by representing items via discrete semantic IDs embedded directly into the generation process. It introduces a structured generation paradigm that decomposes the task into three interdependent steps: response intent identification, recommendation target prediction, and conditional text generation, augmented with constrained decoding to ensure recommendation fidelity. Experimental results demonstrate that the approach achieves up to a 29% improvement over strong baselines in Recall@1 while maintaining high-quality dialogue responses.
📝 Abstract
Conversational recommender systems aim to provide personalized recommendations via natural language interactions. However, existing approaches either decouple recommendation from dialog generation or rely on retrieval-based pipelines, limiting the integration between recommendation and response generation and leading to suboptimal modeling of user intent. In this paper, we propose a fully generative conversational recommender system that unifies recommendation and dialog generation within a single autoregressive framework. Our approach represents items as discrete semantic IDs and integrates them directly into the generation process, enabling joint prediction of items and responses via next-token modeling. We further introduce a structured generation paradigm that factorizes conversational recommendation into a sequence of interdependent decisions, where the model first predicts the response intent and the recommendation target, and then generates the response conditioned on them. This design enables end-to-end optimization, enforces a more coherent dependency structure, and supports faithful item generation via constrained decoding. Extensive experiments demonstrate that our method consistently improves recommendation performance, achieving gains of up to 29% on Recall@1 over strong baselines, while maintaining competitive dialog quality.