DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LLM agents face challenges of myopic decision-making and high planning overhead in proactive, goal-driven dialogue. This paper proposes an online value-based reinforcement learning framework that freezes the LLM to generate high-quality action candidates and employs a lightweight Q-network for value-driven selection under affect-aware constraints. It leverages fixed BERT embeddings, temporal-difference Q-learning, multi-turn emotion tracking, and dynamic reward shaping to enable low-overhead real-time dialogue planning. The core contribution is the first integration of LLM priors with user emotion modeling within the value-learning process—enhancing both task success and empathic quality. Experiments across negotiation, emotional support, and coaching tasks show average goal achievement within ≤3 turns and ≥94% success rate; incorporating upgraded LLM priors further boosts success to >97% and significantly improves negotiation outcomes.

Technology Category

Application Category

📝 Abstract
Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user's emotions, DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under $3$ turns with success rates exceeding 94% and, with a larger LLM prior, pushes success above 97% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale. Code available at https://github.com/declare-lab/dialogxpert/
Problem

Research questions and friction points this paper is trying to address.

Enables proactive goal-driven dialogue via value-based reinforcement learning
Selects optimal conversational moves using compact Q-network and BERT embeddings
Tracks user emotions to foster empathetic task-oriented interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen LLM for high-quality candidate actions
Employs compact Q-network over BERT embeddings
Tracks user emotions for empathetic connections
🔎 Similar Papers
No similar papers found.