Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address myopic response generation, persistent misalignment with evolving user expectations, and high-cost, low-effectiveness preference optimization in multi-turn conversational recommendation, this paper proposes Expectation Confirmation-based Preference Optimization (ECPO), a novel multi-turn preference optimization paradigm grounded in expectation confirmation theory. ECPO is the first to integrate this psychological theory into conversational recommendation, enabling sampling-free, turn-level fine-grained satisfaction modeling and response optimization. It introduces AILO, an interpretable LLM-based user simulator, to support satisfaction attribution and feedback generation. Coupled with dialogue state satisfaction evolution analysis, ECPO forms an end-to-end Multi-Turn Preference Optimization (MTPO) framework. Extensive experiments on multiple benchmarks demonstrate that ECPO significantly improves recommendation effectiveness and interaction efficiency, reduces optimization overhead, and enhances long-term user satisfaction.

Technology Category

Application Category

📝 Abstract
Recent advancements in Large Language Models (LLMs) have significantly propelled the development of Conversational Recommendation Agents (CRAs). However, these agents often generate short-sighted responses that fail to sustain user guidance and meet expectations. Although preference optimization has proven effective in aligning LLMs with user expectations, it remains costly and performs poorly in multi-turn dialogue. To address this challenge, we introduce a novel multi-turn preference optimization (MTPO) paradigm ECPO, which leverages Expectation Confirmation Theory to explicitly model the evolution of user satisfaction throughout multi-turn dialogues, uncovering the underlying causes of dissatisfaction. These causes can be utilized to support targeted optimization of unsatisfactory responses, thereby achieving turn-level preference optimization. ECPO ingeniously eliminates the significant sampling overhead of existing MTPO methods while ensuring the optimization process drives meaningful improvements. To support ECPO, we introduce an LLM-based user simulator, AILO, to simulate user feedback and perform expectation confirmation during conversational recommendations. Experimental results show that ECPO significantly enhances CRA's interaction capabilities, delivering notable improvements in both efficiency and effectiveness over existing MTPO methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-turn conversational recommendation agent responses
Reducing cost and improving performance in preference optimization
Modeling user satisfaction evolution to address dissatisfaction causes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn preference optimization using Expectation Confirmation Theory
LLM-based user simulator for feedback and confirmation
Targeted optimization of unsatisfactory dialogue responses
🔎 Similar Papers
No similar papers found.
X
Xueyang Feng
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; Beijing Key Laboratory of Research on Large Models and Intelligent Governance; Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE
J
Jingsen Zhang
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; Beijing Key Laboratory of Research on Large Models and Intelligent Governance; Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE
Jiakai Tang
Jiakai Tang
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Recommender SystemsMulti-Agent Systems
W
Wei Li
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; Beijing Key Laboratory of Research on Large Models and Intelligent Governance; Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE
G
Guohao Cai
Huawei Noah’s Ark Lab, China
X
Xu Chen
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; Beijing Key Laboratory of Research on Large Models and Intelligent Governance; Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE
Q
Quanyu Dai
Huawei Noah’s Ark Lab, China
Yue Zhu
Yue Zhu
IBM Research
Performance OptimizationI/OStorageCloud
Zhenhua Dong
Zhenhua Dong
Noah's ark lab, Huawei Technologies Co., Ltd.
Recommender systemcausal inferencecountrfactual learningtrustworthy AImachine learning