A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the blind trial-and-error behavior and hallucinated actions exhibited by large language model (LLM) agents in long-horizon tasks—stemming from insufficient global planning capability—this paper proposes EAGLET. Our method operates in two stages: first, it automatically generates high-quality planning data via a homologous consensus filtering mechanism; second, it employs execution-ability-gain-driven, rule-guided reinforcement learning—requiring no human annotation. EAGLET integrates state-of-the-art LLM-based planning generation, supervised fine-tuning for cold-start initialization, and interpretable reward modeling to construct a plug-and-play, lightweight global planner. Evaluated on three long-horizon benchmarks, EAGLET achieves state-of-the-art performance while reducing training cost by 8× compared to standard RL baselines. It significantly enhances both planning reliability and training efficiency.

Technology Category

Application Category

📝 Abstract
Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.
Problem

Research questions and friction points this paper is trying to address.

Enhancing global planning for long-horizon agent tasks
Reducing hallucinatory actions and trial-and-error in LLM agents
Training efficient planners without human effort or extra data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training plug-and-play planner via two-step process
Synthesizing high-quality plans using consensus filtering
Improving planner with rule-based reinforcement learning