Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tool-augmented LLM frameworks (e.g., ReAct) rely on incremental, step-by-step decision-making, making them prone to local optima and ill-suited for complex queries requiring coordinated multi-tool execution. Method: We propose a planner-centric Plan-Execute architecture featuring a global directed acyclic graph (DAG)-based planning mechanism, enabling end-to-end, interpretable, multi-step tool invocation reasoning. Contribution/Results: We introduce the first planner-centric paradigm for complex tool orchestration and establish ComplexTool-Plan—the first benchmark explicitly designed for evaluating advanced planning capabilities. We further design a two-stage training strategy comprising supervised fine-tuning and Grouped Relative Policy Optimization (GRPO). Experiments demonstrate state-of-the-art performance on StableToolBench, with significant gains in execution success rate for complex queries and robustness of multi-tool workflows—advancing tool-augmented reasoning toward systematic, structured, and controllable paradigms.

Technology Category

Application Category

📝 Abstract
Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these limitations, we propose a novel Planner-centric Plan-Execute paradigm that fundamentally resolves local optimization bottlenecks through architectural innovation. Central to our approach is a novel Planner model that performs global Directed Acyclic Graph (DAG) planning for complex queries, enabling optimized execution beyond conventional tool coordination. We also introduce ComplexTool-Plan, a large-scale benchmark dataset featuring complex queries that demand sophisticated multi-tool composition and coordination capabilities. Additionally, we develop a two-stage training methodology that integrates Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), systematically enhancing the Planner's tool selection accuracy and global planning awareness through structured DAG-based planning. When integrated with a capable executor, our framework achieves state-of-the-art performance on the StableToolBench benchmark for complex user queries, demonstrating superior end-to-end execution capabilities and robust handling of intricate multi-tool workflows.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in tool-augmented LLMs for complex queries
Proposes Planner-centric framework to overcome local optimization traps
Enhances multi-tool coordination through global DAG planning methodology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Planner-centric Plan-Execute paradigm for LLM reasoning
Global DAG planning for complex multi-tool queries
Two-stage training with SFT and GRPO optimization
🔎 Similar Papers
No similar papers found.
X
Xiaolong Wei
Beihang University
Y
Yuehu Dong
Baidu Inc.
X
Xingliang Wang
Beijing University of Posts and Telecommunications
Xingyu Zhang
Xingyu Zhang
Horizon Robotics Inc
NLP&VLM&AD
Z
Zhejun Zhao
Baidu Inc.
D
Dongdong Shen
Baidu Inc.
Long Xia
Long Xia
Research Scientist, Baidu
information retrievaldata miningapplied machine learningrecommender system
Dawei Yin
Dawei Yin
Senior Director, Head of Search Science at Baidu
Machine LearningWeb MiningData Mining