🤖 AI Summary
This work addresses the limited generalization of existing methods in tool-augmented reasoning, which often rely heavily on static trajectory memorization and struggle with robust tool invocation when encountering novel or dynamically changing tools. To overcome this, we propose ToolMaster, a novel framework that introduces a “trial-and-execution” joint training paradigm to guide large language models to actively trial tools through environmental interaction and self-correct their usage. Our approach integrates teacher-guided imitation learning—leveraging trajectories that explicitly demonstrate tool trials and self-correction—with reinforcement learning to jointly optimize both tool exploration and execution. Experiments demonstrate that ToolMaster significantly outperforms current baselines across multiple tasks involving previously unseen tools, achieving breakthrough improvements in both generalization and robustness.
📝 Abstract
Equipping Large Language Models (LLMs) with external tools enables them to solve complex real-world problems. However, the robustness of existing methods remains a critical challenge when confronting novel or evolving tools. Existing trajectory-centric paradigms primarily rely on memorizing static solution paths during training, which limits the ability of LLMs to generalize tool usage to newly introduced or previously unseen tools. In this paper, we propose ToolMaster, a framework that shifts tool use from imitating golden tool-calling trajectories to actively learning tool usage through interaction with the environment. To optimize LLMs for tool planning and invocation, ToolMaster adopts a trial-and-execution paradigm, which trains LLMs to first imitate teacher-generated trajectories containing explicit tool trials and self-correction, followed by reinforcement learning to coordinate the trial and execution phases jointly. This process enables agents to autonomously explore correct tool usage by actively interacting with environments and forming experiential knowledge that benefits tool execution. Experimental results demonstrate that ToolMaster significantly outperforms existing baselines in terms of generalization and robustness across unseen or unfamiliar tools. All code and data are available at https://github.com/NEUIR/ToolMaster.