🤖 AI Summary
LLM agents face challenges of myopic decision-making and high planning overhead in proactive, goal-driven dialogue. This paper proposes an online value-based reinforcement learning framework that freezes the LLM to generate high-quality action candidates and employs a lightweight Q-network for value-driven selection under affect-aware constraints. It leverages fixed BERT embeddings, temporal-difference Q-learning, multi-turn emotion tracking, and dynamic reward shaping to enable low-overhead real-time dialogue planning. The core contribution is the first integration of LLM priors with user emotion modeling within the value-learning process—enhancing both task success and empathic quality. Experiments across negotiation, emotional support, and coaching tasks show average goal achievement within ≤3 turns and ≥94% success rate; incorporating upgraded LLM priors further boosts success to >97% and significantly improves negotiation outcomes.
📝 Abstract
Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user's emotions, DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under $3$ turns with success rates exceeding 94% and, with a larger LLM prior, pushes success above 97% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale. Code available at https://github.com/declare-lab/dialogxpert/