🤖 AI Summary
Existing function-calling (FC) data synthesis methods struggle to support multi-turn tool-interaction training, facing three key challenges: poor target-model adaptability, architectural isolation across tools, and strong logical dependencies across turns. This paper proposes an environment-driven framework for generating high-quality multi-turn FC data. Our approach innovatively integrates environment-API graph modeling, high-level tool query synthesis, and guided iterative chain-of-thought (CoT) generation—overcoming traditional limitations in logical coherence and tool decoupling. Specifically, we construct diverse and high-fidelity training data via graph-structured environment modeling, multi-turn dialogue synthesis, reinforcement-learning-based trajectory sampling, and CoT derivation. Evaluated on the BFCL benchmarks, a 4B-parameter model trained on our synthetic data achieves state-of-the-art performance among models of comparable scale on BFCLv3 and consistently outperforms most closed-source models on BFCLv4.
📝 Abstract
Function calling (FC) empowers large language models (LLMs) and autonomous agents to interface with external tools, a critical capability for solving complex, real-world problems. As this ability becomes increasingly central to advanced AI systems, the need for high-quality, multi-turn training data to develop and refine it cannot be overstated. Existing data synthesis methods, such as random environment sampling or multi-agent role-playing, are not powerful enough to generate high-quality data in real-world environments. Practical challenges come in three folds: targeted model training, isolation of tool architecture, and multi-turn logical dependency. To address these structural deficiencies, we present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use. FunReason-MT resolves the complexity barrier in multi-turn FC data by employing 1) Environment-API Graph Interactions to gather varied high-quality trajectories, 2) Advanced Tool-Query Synthesis to simplify hard query construction, and 3) Guided Iterative Chain for sophisticated CoT generation. Evaluations on Berkeley Function-Calling Leaderboard (BFCLv3) demonstrate the power of our framework: a 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models, outperforming most close-source models. Further performance improvements on BFCLv4 confirm that FunReason-MT provides a reliable and robust source for agentic learning.