LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing LLM tool-use methods rely on static data pipelines, decoupling data generation from model training—hindering adaptive focus on model weaknesses and effective removal of noisy labels, thus impairing training efficiency. This paper introduces the first open-source, model-aware data evolution framework, establishing a closed-loop training paradigm comprising three tightly integrated modules: *capability diagnosis*, *label verification*, and *error-driven expansion*. It jointly optimizes data and model through iterative refinement: greedy capability probing identifies model deficiencies; discriminator-guided label verification purifies training data; and error feedback steers targeted data augmentation. The resulting 8B model achieves state-of-the-art performance on BFCL-v3 and ACEBench—surpassing same-scale SOTA models and even outperforming its 32B data generator—marking the first demonstration of data–model co-evolution within an open-source ecosystem.

Technology Category

Application Category

📝 Abstract

Augmenting Large Language Models (LLMs) with external tools enables them to execute complex, multi-step tasks. However, tool learning is hampered by the static synthetic data pipelines where data generation and model training are executed as two separate, non-interactive processes. This approach fails to adaptively focus on a model's specific weaknesses and allows noisy labels to persist, degrading training efficiency. We introduce LoopTool, a fully automated, model-aware data evolution framework that closes this loop by tightly integrating data synthesis and model training. LoopTool iteratively refines both the data and the model through three synergistic modules: (1) Greedy Capability Probing (GCP) diagnoses the model's mastered and failed capabilities; (2) Judgement-Guided Label Verification (JGLV) uses an open-source judge model to find and correct annotation errors, progressively purifying the dataset; and (3) Error-Driven Data Expansion (EDDE) generates new, challenging samples based on identified failures. This closed-loop process operates within a cost-effective, open-source ecosystem, eliminating dependence on expensive closed-source APIs. Experiments show that our 8B model trained with LoopTool significantly surpasses its 32B data generator and achieves new state-of-the-art results on the BFCL-v3 and ACEBench benchmarks for its scale. Our work demonstrates that closed-loop, self-refining data pipelines can dramatically enhance the tool-use capabilities of LLMs.

Problem

Research questions and friction points this paper is trying to address.

Closing the data-training loop for robust LLM tool calls

Addressing static synthetic data pipelines with non-interactive processes

Correcting noisy labels and focusing on model weaknesses adaptively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-loop framework integrates data synthesis with model training

Iterative modules diagnose errors and purify training datasets

Error-driven data expansion generates challenging samples automatically

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation