Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
Existing language model agents struggle to efficiently execute complex instructions in long-horizon tasks due to insufficient planning capabilities. This work proposes a planner-centric multi-agent framework comprising a planner, an executor, and a memory manager. Through computational resource allocation analysis, we demonstrate that the planning component predominantly governs overall performance. Leveraging this insight, we apply reinforcement learning exclusively to the planner, incorporating trajectory-level rewards and a vision-language model-based evaluation mechanism to enable asymmetric computation allocation. The resulting approach achieves significant performance gains across diverse benchmarks—including web navigation, operating system control, and tool usage—thereby validating the efficacy and strong generalization of prioritizing high-level planning.
📝 Abstract
Language model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we propose an enhanced multi-agent framework that decomposes automation into three roles: a planner for high-level decision-making, an actor for task execution, and a memory manager for contextual reasoning. While this modular decomposition aligns with established design patterns, our core contribution lies in a systematic compute-allocation analysis, revealing that planning is the dominant factor influencing task performance. Execution and memory management require significantly less compute and model capacity to achieve competitive results. Building on these insights, we introduce a planner-centric reinforcement learning approach, which exclusively optimizes the planner using trajectory-level rewards from a VLM-as-judge, while freezing the other components. Extensive experiments on benchmarks spanning web navigation, OS control, and tool use demonstrate that concentrating model capacity and learning on high-level planning yields robust and compute-efficient improvements in long-horizon agent automation. Our code is publicly released.
Problem

Research questions and friction points this paper is trying to address.

long-horizon planning
multi-agent collaboration
language model agents
task automation
reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent collaboration
long-horizon planning
compute allocation
planner-centric reinforcement learning
VLM-as-judge