OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs

πŸ“… 2025-10-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Prior work largely overlooks the structural complexity of task orchestration in multi-turn tool interaction. Method: This paper proposes a plan-directed acyclic graph (DAG)-based synthetic data generation framework, modeling tool invocation as controllable DAG structures to enable systematic modeling and evaluation of agent tool orchestration capabilities. Our approach integrates graph neural network representations, GRPO-style reinforcement learning, and a topology-order–aware graph-structured reward mechanism. Contribution/Results: We introduce a novel benchmark for multi-turn tool interaction that balances solvability and challenge. In RLVR training, our graph-structured reward increases task completion rate by 12.7%, demonstrating the efficacy of DAG-based modeling and structured rewards for learning complex tool chains. The framework significantly improves policy learning efficiency and generalization across diverse tool orchestration scenarios.

Technology Category

Application Category

πŸ“ Abstract
Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR training. Experiments show that the dataset presents a challenging but solvable benchmark, and the proposed reward is effective when combined with GRPO-style algorithms, highlighting the importance of leveraging topological structure and data complexity in multi-turn tool use.
Problem

Research questions and friction points this paper is trying to address.

Modeling multi-turn tool interactions with DAGs
Generating synthetic data for complex tool orchestration
Enhancing RL training with graph-based rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models tool execution as directed acyclic graphs
Proposes graph-based reward for RLVR training
Generates synthetic data with controllable complexity
πŸ”Ž Similar Papers
No similar papers found.