Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in LLM-based tool calling: high human annotation cost, poor generalization, and inherent quality limitations of single-model synthesis. To this end, we propose InfTool, a fully automated multi-agent framework featuring a novel self-evolving multi-agent closed loop. Relying solely on raw API specifications, InfTool synergistically employs a user simulator, a tool-calling assistant, and an MCP server to automatically generate and validate diverse execution trajectories—enabling zero-shot, human-annotation-free data synthesis and iterative model refinement. Its core innovations include Group Relative Policy Optimization (GRPO) and a gated reward mechanism, which jointly overcome intrinsic bottlenecks in output quality and tool coverage for single models. On the BFCL benchmark, our 32B model achieves 70.9% accuracy—up from 19.8% (+258%)—surpassing a 10× larger model and matching Claude-Opus, all trained exclusively on synthetic data.

Technology Category

Application Category

📝 Abstract
Enabling Large Language Models (LLMs) to reliably invoke external tools remains a critical bottleneck for autonomous agents. Existing approaches suffer from three fundamental challenges: expensive human annotation for high-quality trajectories, poor generalization to unseen tools, and quality ceilings inherent in single-model synthesis that perpetuate biases and coverage gaps. We introduce InfTool, a fully autonomous framework that breaks these barriers through self-evolving multi-agent synthesis. Given only raw API specifications, InfTool orchestrates three collaborative agents (User Simulator, Tool-Calling Assistant, and MCP Server) to generate diverse, verified trajectories spanning single-turn calls to complex multi-step workflows. The framework establishes a closed loop: synthesized data trains the model via Group Relative Policy Optimization (GRPO) with gated rewards, the improved model generates higher-quality data targeting capability gaps, and this cycle iterates without human intervention. Experiments on the Berkeley Function-Calling Leaderboard (BFCL) demonstrate that InfTool transforms a base 32B model from 19.8% to 70.9% accuracy (+258%), surpassing models 10x larger and rivaling Claude-Opus, and entirely from synthetic data without human annotation.
Problem

Research questions and friction points this paper is trying to address.

Generates diverse tool-use data without human annotation
Improves generalization to unseen tools for LLMs
Overcomes single-model synthesis biases and coverage gaps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent role-playing for autonomous tool-use data synthesis
Closed-loop self-evolution via Group Relative Policy Optimization
Fully synthetic training achieving state-of-the-art accuracy
🔎 Similar Papers
No similar papers found.
Yuwen Li
Yuwen Li
Zhejiang University
numerical analysisscientific computing
W
Wei Zhang
Beihang University
Z
Zelong Huang
Sichuan University
M
Mason Yang
Sichuan University
J
Jiajun Wu
Beihang University
S
Shawn Guo
IQuest Research
H
Huahao Hu
Sichuan University
L
Lingyi Sun
Sichuan University
J
Jian Yang
Beihang University
Mingjie Tang
Mingjie Tang
Purdue University
databasedata miningmachine learningspatial data processing
B
Byran Dai
IQuest Research