OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Prior work largely overlooks the structural complexity of task orchestration in multi-turn tool interaction. Method: This paper proposes a plan-directed acyclic graph (DAG)-based synthetic data generation framework, modeling tool invocation as controllable DAG structures to enable systematic modeling and evaluation of agent tool orchestration capabilities. Our approach integrates graph neural network representations, GRPO-style reinforcement learning, and a topology-order–aware graph-structured reward mechanism. Contribution/Results: We introduce a novel benchmark for multi-turn tool interaction that balances solvability and challenge. In RLVR training, our graph-structured reward increases task completion rate by 12.7%, demonstrating the efficacy of DAG-based modeling and structured rewards for learning complex tool chains. The framework significantly improves policy learning efficiency and generalization across diverse tool orchestration scenarios.

Technology Category

Application Category

📝 Abstract

Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR training. Experiments show that the dataset presents a challenging but solvable benchmark, and the proposed reward is effective when combined with GRPO-style algorithms, highlighting the importance of leveraging topological structure and data complexity in multi-turn tool use.

Problem

Research questions and friction points this paper is trying to address.

Modeling multi-turn tool interactions with DAGs

Generating synthetic data for complex tool orchestration

Enhancing RL training with graph-based rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models tool execution as directed acyclic graphs

Proposes graph-based reward for RLVR training

Generates synthetic data with controllable complexity

🔎 Similar Papers

Tool-Planner: Task Planning with Clusters across Multiple Tools