🤖 AI Summary
To address the blind trial-and-error behavior and hallucinated actions exhibited by large language model (LLM) agents in long-horizon tasks—stemming from insufficient global planning capability—this paper proposes EAGLET. Our method operates in two stages: first, it automatically generates high-quality planning data via a homologous consensus filtering mechanism; second, it employs execution-ability-gain-driven, rule-guided reinforcement learning—requiring no human annotation. EAGLET integrates state-of-the-art LLM-based planning generation, supervised fine-tuning for cold-start initialization, and interpretable reward modeling to construct a plug-and-play, lightweight global planner. Evaluated on three long-horizon benchmarks, EAGLET achieves state-of-the-art performance while reducing training cost by 8× compared to standard RL baselines. It significantly enhances both planning reliability and training efficiency.
📝 Abstract
Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.