Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing tool-augmented LLM frameworks (e.g., ReAct) rely on incremental, step-by-step decision-making, making them prone to local optima and ill-suited for complex queries requiring coordinated multi-tool execution. Method: We propose a planner-centric Plan-Execute architecture featuring a global directed acyclic graph (DAG)-based planning mechanism, enabling end-to-end, interpretable, multi-step tool invocation reasoning. Contribution/Results: We introduce the first planner-centric paradigm for complex tool orchestration and establish ComplexTool-Plan—the first benchmark explicitly designed for evaluating advanced planning capabilities. We further design a two-stage training strategy comprising supervised fine-tuning and Grouped Relative Policy Optimization (GRPO). Experiments demonstrate state-of-the-art performance on StableToolBench, with significant gains in execution success rate for complex queries and robustness of multi-tool workflows—advancing tool-augmented reasoning toward systematic, structured, and controllable paradigms.

Technology Category

Application Category

📝 Abstract

Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these limitations, we propose a novel Planner-centric Plan-Execute paradigm that fundamentally resolves local optimization bottlenecks through architectural innovation. Central to our approach is a novel Planner model that performs global Directed Acyclic Graph (DAG) planning for complex queries, enabling optimized execution beyond conventional tool coordination. We also introduce ComplexTool-Plan, a large-scale benchmark dataset featuring complex queries that demand sophisticated multi-tool composition and coordination capabilities. Additionally, we develop a two-stage training methodology that integrates Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), systematically enhancing the Planner's tool selection accuracy and global planning awareness through structured DAG-based planning. When integrated with a capable executor, our framework achieves state-of-the-art performance on the StableToolBench benchmark for complex user queries, demonstrating superior end-to-end execution capabilities and robust handling of intricate multi-tool workflows.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in tool-augmented LLMs for complex queries

Proposes Planner-centric framework to overcome local optimization traps

Enhances multi-tool coordination through global DAG planning methodology

Innovation

Methods, ideas, or system contributions that make the work stand out.

Planner-centric Plan-Execute paradigm for LLM reasoning

Global DAG planning for complex multi-tool queries

Two-stage training with SFT and GRPO optimization

🔎 Similar Papers

No similar papers found.