Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations in large language model (LLM)-based tool retrieval caused by incomplete and heterogeneous tool documentation, this paper proposes Tool-DE: a framework that (1) establishes a standardized, field-enriched tool description schema; (2) designs an extensible documentation expansion pipeline to automatically generate, validate, and refine high-quality tool corpora; and (3) develops a dedicated dense retriever (Tool-Embed) and an LLM-driven re-ranker (Tool-Rank) for end-to-end tool retrieval. As a key contribution, we release the first large-scale, standardized tool corpus. Empirical evaluation demonstrates state-of-the-art (SOTA) performance on both the ToolRet and Tool-DE benchmarks, robustly validating the critical gains of structured documentation expansion for tool understanding, semantic matching, and retrieval evaluation.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have recently demonstrated strong capabilities in tool use, yet progress in tool retrieval remains hindered by incomplete and heterogeneous tool documentation. To address this challenge, we introduce Tool-DE, a new benchmark and framework that systematically enriches tool documentation with structured fields to enable more effective tool retrieval, together with two dedicated models, Tool-Embed and Tool-Rank. We design a scalable document expansion pipeline that leverages both open- and closed-source LLMs to generate, validate, and refine enriched tool profiles at low cost, producing large-scale corpora with 50k instances for embedding-based retrievers and 200k for rerankers. On top of this data, we develop two models specifically tailored for tool retrieval: Tool-Embed, a dense retriever, and Tool-Rank, an LLM-based reranker. Extensive experiments on ToolRet and Tool-DE demonstrate that document expansion substantially improves retrieval performance, with Tool-Embed and Tool-Rank achieving new state-of-the-art results on both benchmarks. We further analyze the contribution of individual fields to retrieval effectiveness, as well as the broader impact of document expansion on both training and evaluation. Overall, our findings highlight both the promise and limitations of LLM-driven document expansion, positioning Tool-DE, along with the proposed Tool-Embed and Tool-Rank, as a foundation for future research in tool retrieval.
Problem

Research questions and friction points this paper is trying to address.

Tool documentation is incomplete and heterogeneous for retrieval
Document expansion enriches tool profiles to improve retrieval performance
Developed specialized models for embedding and reranking in tool retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tool-DE enriches tool documentation with structured fields
Scalable pipeline uses LLMs to generate validated tool profiles
Tool-Embed and Tool-Rank models achieve state-of-the-art retrieval
🔎 Similar Papers
No similar papers found.