Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Large language models (LLMs) struggle with reliable, multi-step tool invocation and precise execution in real-world tasks. Method: This paper proposes a tool-augmented reinforcement learning framework that generates executable Python code to orchestrate compositional tool calls, enabling user-defined tool integration and cross-step variable sharing. It introduces a sparse reward function grounded in execution outcomes and a dynamic sample queue mechanism to enhance policy optimization efficiency and reuse high-quality trajectories, thereby significantly reducing online sampling overhead. Contribution/Results: Evaluated on the GAIA benchmark, our approach achieves approximately 10% absolute accuracy improvement over prior methods, demonstrating superior robustness and generalization—particularly on complex, multi-step tasks. It establishes an efficient, scalable paradigm for LLM-driven tool-coordinated reasoning.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated strong capabilities in language understanding and reasoning, yet they remain limited when tackling real-world tasks that require up-to-date knowledge, precise operations, or specialized tool use. To address this, we propose Tool-R1, a reinforcement learning framework that enables LLMs to perform general, compositional, and multi-step tool use by generating executable Python code. Tool-R1 supports integration of user-defined tools and standard libraries, with variable sharing across steps to construct coherent workflows. An outcome-based reward function, combining LLM-based answer judgment and code execution success, guides policy optimization. To improve training efficiency, we maintain a dynamic sample queue to cache and reuse high-quality trajectories, reducing the overhead of costly online sampling. Experiments on the GAIA benchmark show that Tool-R1 substantially improves both accuracy and robustness, achieving about 10% gain over strong baselines, with larger improvements on complex multi-step tasks. These results highlight the potential of Tool-R1 for enabling reliable and efficient tool-augmented reasoning in real-world applications. Our code will be available at https://github.com/YBYBZhang/Tool-R1.

Problem

Research questions and friction points this paper is trying to address.

Enabling LLMs to perform multi-step tool use

Improving sample efficiency in reinforcement learning

Enhancing accuracy and robustness in tool-augmented reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework for tool use

Generates executable Python code for workflows

Dynamic sample queue for efficient training

🔎 Similar Papers

StepTool: Enhancing Multi-Step Tool Usage in LLMs through Step-Grained Reinforcement Learning