Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This paper addresses the weak generalization capability of large language models (LLMs) in tool learning—where existing approaches either require fine-tuning (limiting adaptation to seen tools) or rely on inefficient in-context demonstrations. We propose Chain-of-Tools, a frozen-LLM framework that enables zero-shot selection and compositional reasoning over large-scale unseen tools via semantic alignment between queries and tool descriptions, CoT-driven dynamic tool retrieval, and sequential tool invocation. Key contributions include: (1) the first zero-shot tool-chain scheduling mechanism for unseen tools; (2) a novel benchmark, SimpleToolQuestions, designed to evaluate generalization across diverse tool sets; and (3) an interpretability analysis identifying critical output dimensions governing tool selection. Experiments on GSM8K-XL, FuncQA, KAMEL, and SimpleToolQuestions demonstrate substantial improvements over baselines, achieving accurate invocation across thousands of unseen tools.

Technology Category

Application Category

📝 Abstract

Tool learning can further broaden the usage scenarios of large language models (LLMs). However most of the existing methods either need to finetune that the model can only use tools seen in the training data, or add tool demonstrations into the prompt with lower efficiency. In this paper, we present a new Tool Learning method Chain-of-Tools. It makes full use of the powerful semantic representation capability of frozen LLMs to finish tool calling in CoT reasoning with a huge and flexible tool pool which may contain unseen tools. Especially, to validate the effectiveness of our approach in the massive unseen tool scenario, we construct a new dataset SimpleToolQuestions. We conduct experiments on two numerical reasoning benchmarks (GSM8K-XL and FuncQA) and two knowledge-based question answering benchmarks (KAMEL and SimpleToolQuestions). Experimental results show that our approach performs better than the baseline. We also identify dimensions of the model output that are critical in tool selection, enhancing the model interpretability. Our code and data are available at: https://github.com/fairyshine/Chain-of-Tools .

Problem

Research questions and friction points this paper is trying to address.

Enables frozen LLMs to use massive unseen tools

Improves tool learning efficiency without finetuning

Enhances interpretability in tool selection process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen LLMs for tool calling

Supports massive unseen tools pool

Enhances model interpretability via output analysis

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation