Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of large language model (LLM) agents in tool invocation, which are highly dependent on the quality of human-written tool interface descriptions and suffer significant performance degradation in cold-start scenarios with numerous candidate tools or absent execution traces. To overcome this, the authors propose Trace-Free+, a framework that leverages curriculum learning to transfer supervised knowledge from trace-rich environments to trace-free deployment settings. This approach guides the model to learn reusable tool-use patterns and automatically refine tool descriptions without relying on execution trajectories. Trace-Free+ supports cross-tool generalization and scales effectively to tool sets comprising hundreds of functions. Experiments on StableToolBench and RestBench demonstrate that Trace-Free+ substantially improves invocation accuracy on unseen tools, exhibiting strong cross-domain generalization and robustness at scale.

Technology Category

Application Category

📝 Abstract
The performance of LLM-based agents depends not only on the agent itself but also on the quality of the tool interfaces it consumes. While prior work has focused heavily on agent fine-tuning, tool interfaces-including natural language descriptions and parameter schemas-remain largely human-oriented and often become a bottleneck, especially when agents must select from large candidate tool sets. Existing approaches to improving tool interfaces rely on execution traces, which are frequently unavailable in cold-start or privacy-constrained settings, and typically optimize each tool independently, limiting scalability and generalization to unseen tools. We propose Trace-Free+, a curriculum learning framework that progressively transfers supervision from trace-rich settings to trace-free deployment, encouraging the model to abstract reusable interface-usage patterns and tool usage outcomes. To support this approach, we construct a large-scale dataset of high-quality tool interfaces using a structured workflow over a diverse collection of tools. Experiments on StableToolBench and RestBench show consistent gains on unseen tools, strong cross-domain generalization, and robustness as the number of candidate tools scales to over 100, demonstrating that tool interface optimization is a practical and deployable complement to agent fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

tool interface
LLM-agent
tool description
cold-start
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool interface optimization
trace-free learning
curriculum learning
LLM-agent tool use
cross-domain generalization
🔎 Similar Papers
Ruocheng Guo
Ruocheng Guo
Intuit AI Research
LLMsCausal MLData Mining
K
Kaiwen Dong
Intuit AI Research, Mountain View, CA, USA
Xiang Gao
Xiang Gao
Intuit
deep learning
K
Kamalika Das
Intuit AI Research, Mountain View, CA, USA