π€ AI Summary
Prior work largely overlooks the structural complexity of task orchestration in multi-turn tool interaction. Method: This paper proposes a plan-directed acyclic graph (DAG)-based synthetic data generation framework, modeling tool invocation as controllable DAG structures to enable systematic modeling and evaluation of agent tool orchestration capabilities. Our approach integrates graph neural network representations, GRPO-style reinforcement learning, and a topology-orderβaware graph-structured reward mechanism. Contribution/Results: We introduce a novel benchmark for multi-turn tool interaction that balances solvability and challenge. In RLVR training, our graph-structured reward increases task completion rate by 12.7%, demonstrating the efficacy of DAG-based modeling and structured rewards for learning complex tool chains. The framework significantly improves policy learning efficiency and generalization across diverse tool orchestration scenarios.
π Abstract
Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR training. Experiments show that the dataset presents a challenging but solvable benchmark, and the proposed reward is effective when combined with GRPO-style algorithms, highlighting the importance of leveraging topological structure and data complexity in multi-turn tool use.