🤖 AI Summary
This paper addresses the challenge of enabling large language models (LLMs) to support multi-turn, cross-day/week/month dynamic interaction for long-horizon personal goal planning (e.g., academic tutoring, health management). We propose a hierarchical dialog-based planning system featuring a novel “LLM meta-controller + tool-augmented option policy” architecture. The approach integrates hierarchical task planning, dialog state tracking, and user-feedback-driven iterative optimization to enable natural-language plan generation, dynamic adaptation, and executable scheduling. Our core contribution lies in ensuring long-term planning’s explainability, feedback responsiveness, and adaptability. Empirical evaluation across academic and health domains demonstrates significant improvements: +28% in plan feasibility, +35% in user engagement, and +41% in goal attainment rate, validating the system’s effectiveness in sustaining meaningful, goal-oriented human–LLM collaboration over extended periods.
📝 Abstract
The language generation and reasoning capabilities of large language models (LLMs) have enabled conversational systems with impressive performance in a variety of tasks, from code generation, to composing essays, to passing STEM and legal exams, to a new paradigm for knowledge search. Besides those short-term use applications, LLMs are increasingly used to help with real-life goals or tasks that take a long time to complete, involving multiple sessions across days, weeks, months, or even years. Thus to enable conversational systems for long term interactions and tasks, we need language-based agents that can plan for long horizons. Traditionally, such capabilities were addressed by reinforcement learning agents with hierarchical planning capabilities. In this work, we explore a novel architecture where the LLM acts as the meta-controller deciding the agent's next macro-action, and tool use augmented LLM-based option policies execute the selected macro-action. We instantiate this framework for a specific set of macro-actions enabling adaptive planning for users' personal plans through conversation and follow-up questions collecting user feedback. We show how this paradigm can be applicable in scenarios ranging from tutoring for academic and non-academic tasks to conversational coaching for personal health plans.