ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Large language models (LLMs) lack domain-specific tools for complex reasoning, and existing automated tool generation methods produce unstructured, inefficiently retrievable tool repositories. Method: This paper proposes a scalable, automated framework for tool generation and aggregation. Its core innovation is the first introduction of a multi-agent collaboration mechanism: a semantic clustering agent categorizes tool functionalities; a code refactoring agent performs logical abstraction and function extraction; and a review agent validates functional correctness—jointly compressing disparate tools losslessly into a structured, generalized tool library. Contribution/Results: Experiments demonstrate significant improvements in tool retrieval accuracy and reasoning performance, outperforming state-of-the-art baselines on domain-specific tasks (e.g., physics). The framework exhibits strong scalability and functional consistency, enabling efficient, maintainable tool ecosystem construction.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) equipped with external tools have demonstrated enhanced performance on complex reasoning tasks. The widespread adoption of this tool-augmented reasoning is hindered by the scarcity of domain-specific tools. For instance, in domains such as physics question answering, suitable and specialized tools are often missing. Recent work has explored automating tool creation by extracting reusable functions from Chain-of-Thought (CoT) reasoning traces; however, these approaches face a critical scalability bottleneck. As the number of generated tools grows, storing them in an unstructured collection leads to significant retrieval challenges, including an expanding search space and ambiguity between function-related tools. To address this, we propose a systematic approach to automatically refactor an unstructured collection of tools into a structured tool library. Our system first generates discrete, task-specific tools and clusters them into semantically coherent topics. Within each cluster, we introduce a multi-agent framework to consolidate scattered functionalities: a code agent refactors code to extract shared logic and creates versatile, aggregated tools, while a reviewing agent ensures that these aggregated tools maintain the complete functional capabilities of the original set. This process transforms numerous question-specific tools into a smaller set of powerful, aggregated tools without loss of functionality. Experimental results demonstrate that our approach significantly improves tool retrieval accuracy and overall reasoning performance across multiple reasoning tasks. Furthermore, our method shows enhanced scalability compared with baselines as the number of question-specific increases.

Problem

Research questions and friction points this paper is trying to address.

Automating tool creation from reasoning traces for LLMs

Structuring tool libraries to overcome retrieval scalability issues

Aggregating scattered tools into versatile functional clusters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically refactors unstructured tools into structured library

Clusters tools into semantically coherent topics for organization

Uses multi-agent framework to consolidate and aggregate tools

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation