🤖 AI Summary
To address the high annotation failure rate and low efficiency in synthetic tool-use data generation, this paper proposes ToolGrad—a zero-failure, “answer-first” framework for automated data synthesis. ToolGrad fundamentally reverses the conventional generation pipeline: it first constructs valid tool invocation chains via agent-driven iterative chaining and text-gradient-guided search; then, it back-translates each chain into a natural language query that semantically aligns with the execution trace. This inversion guarantees 100% annotation validity while simultaneously enhancing both data complexity and synthesis throughput. Leveraging ToolGrad, we synthesize a high-quality 5K-sample dataset. Models trained on this data consistently outperform open-source and commercial large language models of comparable scale on out-of-distribution tool-use benchmarks—achieving superior performance at significantly lower training cost.
📝 Abstract
Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like DFS. This leads to inevitable annotation failures and low efficiency in data generation. We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual "gradients", and then synthesizes corresponding user queries. This "answer-first" approach led to ToolGrad-5k, a dataset generated with more complex tool use, lower cost, and 100% pass rate. Experiments show that models trained on ToolGrad-5k outperform those on expensive baseline datasets and proprietary LLMs, even on OOD benchmarks.