๐ค AI Summary
This paper addresses the weak generalization capability of large language models (LLMs) in tool learningโwhere existing approaches either require fine-tuning (limiting adaptation to seen tools) or rely on inefficient in-context demonstrations. We propose Chain-of-Tools, a frozen-LLM framework that enables zero-shot selection and compositional reasoning over large-scale unseen tools via semantic alignment between queries and tool descriptions, CoT-driven dynamic tool retrieval, and sequential tool invocation. Key contributions include: (1) the first zero-shot tool-chain scheduling mechanism for unseen tools; (2) a novel benchmark, SimpleToolQuestions, designed to evaluate generalization across diverse tool sets; and (3) an interpretability analysis identifying critical output dimensions governing tool selection. Experiments on GSM8K-XL, FuncQA, KAMEL, and SimpleToolQuestions demonstrate substantial improvements over baselines, achieving accurate invocation across thousands of unseen tools.
๐ Abstract
Tool learning can further broaden the usage scenarios of large language models (LLMs). However most of the existing methods either need to finetune that the model can only use tools seen in the training data, or add tool demonstrations into the prompt with lower efficiency. In this paper, we present a new Tool Learning method Chain-of-Tools. It makes full use of the powerful semantic representation capability of frozen LLMs to finish tool calling in CoT reasoning with a huge and flexible tool pool which may contain unseen tools. Especially, to validate the effectiveness of our approach in the massive unseen tool scenario, we construct a new dataset SimpleToolQuestions. We conduct experiments on two numerical reasoning benchmarks (GSM8K-XL and FuncQA) and two knowledge-based question answering benchmarks (KAMEL and SimpleToolQuestions). Experimental results show that our approach performs better than the baseline. We also identify dimensions of the model output that are critical in tool selection, enhancing the model interpretability. Our code and data are available at: https://github.com/fairyshine/Chain-of-Tools .