🤖 AI Summary
Large language models (LLMs) exhibit limited planning capabilities in multi-step reasoning and goal-directed tasks. To address this, we propose the Modular Agent Planning (MAP) architecture—a cognitively inspired, reinforcement learning–informed framework that decomposes planning into specialized LLM modules: conflict monitoring, state prediction, and task decomposition. These modules operate in a recurrent, collaborative loop to enable dynamic, adaptive planning. MAP supports lightweight deployment and cross-task generalization without fine-tuning, seamlessly adapting to diverse LLM scales (e.g., Llama3-70B). Empirical evaluation on graph traversal, Tower of Hanoi, PlanBench, and StrategyQA demonstrates substantial improvements over zero-shot prompting, chain-of-thought, and tree-of-thought baselines—yielding higher planning accuracy and robustness. Our core contribution is the first systematic integration of modular, division-of-labor mechanisms into LLM-based planning, establishing a novel paradigm for structured, interpretable, and scalable agent reasoning.
📝 Abstract
Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. Both cognitive neuroscience and reinforcement learning (RL) have proposed a number of interacting functional components that together implement search and evaluation in multi-step decision making. These components include conflict monitoring, state prediction, state evaluation, task decomposition, and orchestration. To improve planning with LLMs, we propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of the specialized modules mentioned above, each implemented using an LLM. MAP improves planning through the interaction of specialized modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate MAP on three challenging planning tasks -- graph traversal, Tower of Hanoi, and the PlanBench benchmark -- as well as an NLP task requiring multi-step reasoning (strategyQA). We find that MAP yields significant improvements over both standard LLM methods (zero-shot prompting, in-context learning) and competitive baselines (chain-of-thought, multi-agent debate, and tree-of-thought), can be effectively combined with smaller and more cost-efficient LLMs (Llama3-70B), and displays superior transfer across tasks. These results suggest the benefit of a modular and multi-agent approach to planning with LLMs.