Unlocking Proactivity in Task-Oriented Dialogue

๐Ÿ“… 2026-05-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

205K/year
๐Ÿค– AI Summary
This work addresses the challenge that large language models in task-oriented dialogue often adopt overly conservative strategies, failing to proactively uncover usersโ€™ implicit concerns or effectively guide conversations toward successful outcomes. To overcome this limitation, the study introduces a novel framework that explicitly models usersโ€™ implicit concerns as a critical training signal. It proposes a cognitive user simulator coupled with an asymmetric-perspective policy optimization architecture, integrating online self-distillation, state-transition fine-tuning, and reinforcement learning to transcend the constraints of conventional passive sampling. This approach generates high-fidelity, diverse dialogue interactions and significantly enhances the agentโ€™s ability to actively understand and persuasively engage users within a limited number of turns.
๐Ÿ“ Abstract
Proactive task-oriented dialogue (TOD), such as outbound sales, demands a persuasive agent that actively probes the user's concerns and steers the conversation toward acceptance within a bounded number of turns. Yet post-trained LLMs are inherently conservative, and reward-shaping RL (e.g., GRPO) struggles since it only re-weights what an already passive policy samples. We show that conditioning on the user's latent concerns unlocks proactive capability that no amount of sampling can undermine, establishing these concerns as a pivotal training-time signal. To operationalize this finding, we build the \textbf{Cognitive User Simulator}, which models each user as a stratified persona comprising observable external traits and hidden internal concerns. The simulator produces faithful and diverse interactions, while emitting per-turn state dynamics that track persuasion progress. We then introduce \textbf{Simulator-Induced Asymmetric-View Policy Optimization}, which converts the modeled concerns and the simulation state transition into complementary training objectives: (1) \emph{Asymmetric On-Policy Self-Distillation} that transfers concern-aware behavior from a privileged view of the same policy into its deployable, conversation-only view; and (2) \emph{State-Transition Policy Refinement} ...
Problem

Research questions and friction points this paper is trying to address.

proactive dialogue
task-oriented dialogue
latent user concerns
persuasive agent
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

proactive dialogue
latent user concerns
cognitive user simulator
asymmetric policy distillation
state-transition optimization
๐Ÿ”Ž Similar Papers
No similar papers found.