Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

๐Ÿ“… 2025-03-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the weak generalization capability of large language models (LLMs) in tool learningโ€”where existing approaches either require fine-tuning (limiting adaptation to seen tools) or rely on inefficient in-context demonstrations. We propose Chain-of-Tools, a frozen-LLM framework that enables zero-shot selection and compositional reasoning over large-scale unseen tools via semantic alignment between queries and tool descriptions, CoT-driven dynamic tool retrieval, and sequential tool invocation. Key contributions include: (1) the first zero-shot tool-chain scheduling mechanism for unseen tools; (2) a novel benchmark, SimpleToolQuestions, designed to evaluate generalization across diverse tool sets; and (3) an interpretability analysis identifying critical output dimensions governing tool selection. Experiments on GSM8K-XL, FuncQA, KAMEL, and SimpleToolQuestions demonstrate substantial improvements over baselines, achieving accurate invocation across thousands of unseen tools.

Technology Category

Application Category

๐Ÿ“ Abstract
Tool learning can further broaden the usage scenarios of large language models (LLMs). However most of the existing methods either need to finetune that the model can only use tools seen in the training data, or add tool demonstrations into the prompt with lower efficiency. In this paper, we present a new Tool Learning method Chain-of-Tools. It makes full use of the powerful semantic representation capability of frozen LLMs to finish tool calling in CoT reasoning with a huge and flexible tool pool which may contain unseen tools. Especially, to validate the effectiveness of our approach in the massive unseen tool scenario, we construct a new dataset SimpleToolQuestions. We conduct experiments on two numerical reasoning benchmarks (GSM8K-XL and FuncQA) and two knowledge-based question answering benchmarks (KAMEL and SimpleToolQuestions). Experimental results show that our approach performs better than the baseline. We also identify dimensions of the model output that are critical in tool selection, enhancing the model interpretability. Our code and data are available at: https://github.com/fairyshine/Chain-of-Tools .
Problem

Research questions and friction points this paper is trying to address.

Enables frozen LLMs to use massive unseen tools
Improves tool learning efficiency without finetuning
Enhances interpretability in tool selection process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen LLMs for tool calling
Supports massive unseen tools pool
Enhances model interpretability via output analysis
M
Mengsong Wu
Soochow University, Shizi Street 1, 215006 Suzhou, China
T
Tong Zhu
Soochow University, Shizi Street 1, 215006 Suzhou, China
Han Han
Han Han
LS2N
Musical Information RetrievalAcoustics
X
Xiang Zhang
Soochow University, Shizi Street 1, 215006 Suzhou, China
W
Wenbiao Shao
Soochow University, Shizi Street 1, 215006 Suzhou, China
W
Wenliang Chen
Soochow University, Shizi Street 1, 215006 Suzhou, China