Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance

πŸ“… 2025-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing open-domain dialogue systems exhibit limited capability in proactively identifying user preferences and steering conversations, leading to perceived neglect and reduced user satisfaction. To address this, we propose a user-centric proactive dialogue framework featuring: (1) a novel critic-guided proactivity enhancement paradigm, where an LLM-as-a-judge dynamically evaluates and guides response generation; (2) ISCO-800β€”the first dataset explicitly designed for user background modeling; and (3) an iterative curriculum learning strategy grounded in communication difficulty, integrating user simulation with multi-source preference modeling. Experiments demonstrate substantial improvements across multiple large language models in proactivity, topic steerability, and conversational engagement, with strong generalization to diverse open-domain scenarios.

Technology Category

Application Category

πŸ“ Abstract
Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has greatly advanced this field by improving context understanding and conversational fluency. However, existing LLM-based dialogue systems often fall short in proactively understanding the user's chatting preferences and guiding conversations toward user-centered topics. This lack of user-oriented proactivity can lead users to feel unappreciated, reducing their satisfaction and willingness to continue the conversation in human-computer interactions. To address this issue, we propose a User-oriented Proactive Chatbot (UPC) to enhance the user-oriented proactivity. Specifically, we first construct a critic to evaluate this proactivity inspired by the LLM-as-a-judge strategy. Given the scarcity of high-quality training data, we then employ the critic to guide dialogues between the chatbot and user agents, generating a corpus with enhanced user-oriented proactivity. To ensure the diversity of the user backgrounds, we introduce the ISCO-800, a diverse user background dataset for constructing user agents. Moreover, considering the communication difficulty varies among users, we propose an iterative curriculum learning method that trains the chatbot from easy-to-communicate users to more challenging ones, thereby gradually enhancing its performance. Experiments demonstrate that our proposed training method is applicable to different LLMs, improving user-oriented proactivity and attractiveness in open-domain dialogues.
Problem

Research questions and friction points this paper is trying to address.

Enhancing user-oriented proactivity in open-domain dialogues
Addressing lack of user-centered topic guidance in chatbots
Improving conversational satisfaction with diverse user backgrounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses critic guidance to enhance user-oriented proactivity
Introduces ISCO-800 for diverse user background simulation
Employs iterative curriculum learning for gradual performance improvement
πŸ”Ž Similar Papers
No similar papers found.
Y
Yufeng Wang
South China University of Technology, Peng Cheng Laboratory, Pazhou Laboratory
Jinwu Hu
Jinwu Hu
South China University of Technology; Pazhou Lab
Large Language ModelsComputer VisionReinforcement Learning
Z
Ziteng Huang
South China University of Technology
Kunyang Lin
Kunyang Lin
Tencent AI
Embodied AIEmbodied NavigationReinforcement Learning
Z
Zitian Zhang
South China University of Technology
Peihao Chen
Peihao Chen
Researcher at Robotics X Lab, Tencent
Embodied AIMulti-Modal Video Understanding
Y
Yu Hu
Hong Kong Polytechnic University
Q
Qianyue Wang
South China University of Technology
Z
Zhuliang Yu
South China University of Technology
B
Bin Sun
Hunan University
Xiaofen Xing
Xiaofen Xing
South China University of Technology
Q
Qingfang Zheng
Peng Cheng Laboratory
Mingkui Tan
Mingkui Tan
South China University of Technology
Machine LearningLarge-scale Optimization