🤖 AI Summary
To address the weak planning capability and low plan accuracy of large language models (LLMs) in long-horizon, multi-step tasks, this paper proposes a planner-executor decoupled two-stage framework: a Planner generates structured high-level plans, while an Executor performs environment-specific actions. We introduce an explicit planning augmentation paradigm, designing a scalable synthetic data generation method to construct diverse plan trajectories with ground-truth annotations. By integrating trajectory alignment annotation, synthetic data distillation, and generalization-enhanced training, we significantly improve planning robustness. Evaluated on the WebArena-Lite benchmark, our approach achieves a 54% task success rate—setting a new state-of-the-art for long-horizon web navigation—and establishes a novel paradigm for reliable long-term planning in LLM-based agents.
📝 Abstract
Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of the-art 54% success rate on the WebArena-Lite benchmark.