Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the limited generalization of existing methods in tool-augmented reasoning, which often rely heavily on static trajectory memorization and struggle with robust tool invocation when encountering novel or dynamically changing tools. To overcome this, we propose ToolMaster, a novel framework that introduces a “trial-and-execution” joint training paradigm to guide large language models to actively trial tools through environmental interaction and self-correct their usage. Our approach integrates teacher-guided imitation learning—leveraging trajectories that explicitly demonstrate tool trials and self-correction—with reinforcement learning to jointly optimize both tool exploration and execution. Experiments demonstrate that ToolMaster significantly outperforms current baselines across multiple tasks involving previously unseen tools, achieving breakthrough improvements in both generalization and robustness.

Technology Category

Application Category

📝 Abstract

Equipping Large Language Models (LLMs) with external tools enables them to solve complex real-world problems. However, the robustness of existing methods remains a critical challenge when confronting novel or evolving tools. Existing trajectory-centric paradigms primarily rely on memorizing static solution paths during training, which limits the ability of LLMs to generalize tool usage to newly introduced or previously unseen tools. In this paper, we propose ToolMaster, a framework that shifts tool use from imitating golden tool-calling trajectories to actively learning tool usage through interaction with the environment. To optimize LLMs for tool planning and invocation, ToolMaster adopts a trial-and-execution paradigm, which trains LLMs to first imitate teacher-generated trajectories containing explicit tool trials and self-correction, followed by reinforcement learning to coordinate the trial and execution phases jointly. This process enables agents to autonomously explore correct tool usage by actively interacting with environments and forming experiential knowledge that benefits tool execution. Experimental results demonstrate that ToolMaster significantly outperforms existing baselines in terms of generalization and robustness across unseen or unfamiliar tools. All code and data are available at https://github.com/NEUIR/ToolMaster.

Problem

Research questions and friction points this paper is trying to address.

tool usage generalization

LLM robustness

novel tools

trajectory memorization

tool interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

tool learning

trial-and-execution

environment interaction