🤖 AI Summary
This work addresses the limitations of existing tool-integrated reasoning methods, which rely on external documentation and consequently suffer from poor tool mastery, limited scalability, and low reasoning efficiency. To overcome these challenges, the authors propose TInR-U, a novel tool-internalized reasoning framework that, for the first time, embeds tool knowledge directly within large language models, thereby unifying reasoning and tool usage. TInR-U employs a three-stage training pipeline—tool internalization, supervised fine-tuning warm-up, and TInR-oriented reinforcement learning—augmented with a bidirectional knowledge alignment strategy and a dedicated reward mechanism to eliminate dependence on external documents. Experimental results demonstrate that TInR-U significantly enhances performance on both in-domain and cross-domain tasks, confirming its effectiveness and efficiency.
📝 Abstract
Tool-Integrated Reasoning (TIR) has emerged as a promising direction by extending Large Language Models' (LLMs) capabilities with external tools during reasoning. Existing TIR methods typically rely on external tool documentation during reasoning. However, this leads to tool mastery difficulty, tool size constraints, and inference inefficiency. To mitigate these issues, we explore Tool-Internalized Reasoning (TInR), aiming at facilitating reasoning with tool knowledge internalized into LLMs. Achieving this goal presents notable requirements, including tool internalization and tool-reasoning coordination. To address them, we propose TInR-U, a tool-internalized reasoning framework for unified reasoning and tool usage. TInR-U is trained through a three-phase pipeline: 1) tool internalization with a bidirectional knowledge alignment strategy; 2) supervised fine-tuning warm-up using high-quality reasoning annotations, and 3) reinforcement learning with TInR-specific rewards. We comprehensively evaluate our method across in-domain and out-of-domain settings. Experiment results show that TInR-U achieves superior performance in both settings, highlighting its effectiveness and efficiency.