Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work investigates the cross-domain generalization capability of tool-augmented reinforcement learning (RL), addressing a key question: Can an LLM agent—equipped with a code interpreter and trained exclusively on mathematical reasoning—transfer its learned tool-use policies to non-mathematical domains? To this end, we propose Tool-Generalizable Reinforcement Learning (TGRL), featuring three core innovations: (1) a standardized, domain-agnostic tool interface; (2) a dual-component reward mechanism that decouples correctness from efficiency; and (3) an XML-structured prompt template to unify tool invocation patterns. Evaluated on diverse non-mathematical reasoning tasks—including logic puzzles, symbolic reasoning, and commonsense QA—TGRL achieves significant improvements in both task success rate and token efficiency, attaining state-of-the-art performance. Our results provide the first systematic empirical validation that tool-augmented RL policies exhibit strong cross-domain transferability.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains underexplored. In this work, we investigate the cross-domain generalization of an LLM agent equipped with a code interpreter tool, which is exclusively trained on mathematical problem-solving tasks. Despite the restricted training domain, we evaluate the agent's performance across several distinct reasoning domains. The results reveal that RL-based tool usage learned from mathematical tasks can be effectively transferred to complex tasks in other domains, enabling great task performance and high token efficiency. To facilitate this cross-domain transfer, we propose a Tool Generalization Reinforcement Learning (TGRL) framework designed to promote domain-agnostic learning and skill migration, encompassing: (i) a standardized tool interface that abstracts domain-specific nuances through consistent formatting and explicit termination, fostering transferable invocation patterns; (ii) a dual-component reward system that decomposes rewards to incentivize generalizable behaviors like tool efficiency and reasoning abstraction, ensuring alignment and robustness across domain shifts; and (iii) an XML-based prompt template that separates thinking, tool calls, and responses to encourage modular, domain-invariant planning and coherent multi-turn interactions. Extensive experiments across diverse benchmarks validate our approach, achieving state-of-the-art performance and highlighting the cross-domain potential of Tool RL for LLM reasoning.

Problem

Research questions and friction points this paper is trying to address.

Investigating cross-domain generalization of tool-augmented reinforcement learning agents

Evaluating mathematical task training transfer to complex reasoning domains

Developing framework for domain-agnostic tool usage and skill migration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized tool interface for domain-agnostic learning

Dual-component reward system for generalizable behaviors

XML-based prompt template for modular planning

🔎 Similar Papers

StepTool: Enhancing Multi-Step Tool Usage in LLMs through Step-Grained Reinforcement Learning