🤖 AI Summary
Goal-oriented open-domain dialogue systems struggle to simultaneously achieve user personalization, phase adaptability, and low-data learning. Method: This paper proposes a novel framework integrating large language models (LLMs) with a hierarchical reinforcement learning (HRL)-based dialogue manager. Specifically: (i) HRL models multi-phase dialogue policies to enable smooth, goal-driven transitions across phases; (ii) a meta-learning mechanism enables rapid personalization across diverse user profiles; and (iii) an LLM–HRL co-architecture decouples semantic generation from policy decision-making, reducing reliance on annotated dialogue data. Results: Evaluated on motivational interviewing tasks, the proposed dialogue manager achieves significantly higher reward scores than state-of-the-art LLM-based baselines, demonstrating superior performance in goal completion rate, user adaptability, and data efficiency.
📝 Abstract
In this work, we propose a novel framework that integrates large language models (LLMs) with an RL-based dialogue manager for open-ended dialogue with a specific goal. By leveraging hierarchical reinforcement learning to model the structured phases of dialogue and employ meta-learning to enhance adaptability across diverse user profiles, our approach enhances adaptability and efficiency, enabling the system to learn from limited data, transition fluidly between dialogue phases, and personalize responses to heterogeneous patient needs. We apply our framework to Motivational Interviews, aiming to foster behavior change, and demonstrate that the proposed dialogue manager outperforms a state-of-the-art LLM baseline in terms of reward, showing a potential benefit of conditioning LLMs to create open-ended dialogue systems with specific goals.