GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based autonomous agents predominantly adopt sequential reasoning paradigms (e.g., ReAct), which fail to exploit inherent parallelism among subtasks, resulting in inefficient tool invocation and suboptimal multi-step reasoning performance. Method: We propose the Graph-Aware Planning (GAP) framework—an agent planning architecture that explicitly decomposes tasks into dependency graphs, enabling adaptive parallel or sequential subtask scheduling. GAP integrates graph-based subtask decomposition, supervised fine-tuning, and correctness-driven reinforcement learning to generate high-quality training trajectories on multi-hop question answering data. Contribution/Results: Experiments demonstrate that GAP significantly outperforms ReAct baselines on multi-step reasoning tasks: tool invocations decrease by 32.7%, and task accuracy improves by 14.2%. These results validate that graph-structured planning is critical for enhancing both efficiency and robustness of LLM agents.

Technology Category

Application Category

📝 Abstract
Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on sequential reasoning and execution, failing to exploit the inherent parallelism among independent sub-tasks. This sequential bottleneck leads to inefficient tool utilization and suboptimal performance in multi-step reasoning scenarios. We introduce Graph-based Agent Planning (GAP), a novel framework that explicitly models inter-task dependencies through graph-based planning to enable adaptive parallel and serial tool execution. Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs, autonomously determining which tools can be executed in parallel and which must follow sequential dependencies. This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy. To train GAP, we construct a high-quality dataset of graph-based planning traces derived from the Multi-Hop Question Answering (MHQA) benchmark. We employ a two-stage training strategy: supervised fine-tuning (SFT) on the curated dataset, followed by reinforcement learning (RL) with a correctness-based reward function on strategically sampled queries where tool-based reasoning provides maximum value. Experimental results on MHQA datasets demonstrate that GAP significantly outperforms traditional ReAct baselines, particularly on multi-step retrieval tasks, while achieving dramatic improvements in tool invocation efficiency through intelligent parallelization. The project page is available at: https://github.com/WJQ7777/Graph-Agent-Planning.
Problem

Research questions and friction points this paper is trying to address.

Enables parallel tool execution for autonomous agents through graph-based planning
Overcomes sequential bottlenecks in multi-step reasoning scenarios for LLM agents
Improves tool utilization efficiency and task accuracy via dependency-aware orchestration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based planning enables parallel tool execution
Dependency-aware sub-task graphs for adaptive orchestration
Two-stage training combines supervised and reinforcement learning
🔎 Similar Papers
No similar papers found.
J
Jiaqi Wu
Tsinghua University
Q
Qinlao Zhao
Huazhong University of Science and Technology
Z
Zefeng Chen
National University of Singapore
K
Kai Qin
Tsinghua University
Yifei Zhao
Yifei Zhao
上海科技大学
Xueqian Wang
Xueqian Wang
Tsinghua University
Information FusionTarget DetectionRadar ImagingImage Processing
Y
Yuhang Yao
Carnegie Mellon University