AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints

📅 2026-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency in tool-augmented reinforcement learning, where models often suffer from either excessive reasoning on simple tasks or insufficient reasoning depth on complex ones, limiting scalability. To overcome this, the authors propose a novel paradigm that integrates supervised fine-tuning with reinforcement learning, augmented by a decoupled entropy-based constraint mechanism. This mechanism enables the model to automatically assess problem complexity and dynamically select an optimal reasoning length. The approach combines supervised warm-up, reinforcement learning optimization, and entropy-driven length control to maintain reasoning diversity while significantly improving scalability. Experimental results demonstrate that the method achieves an average accuracy gain of 9.8% across three benchmarks while reducing computational overhead by approximately 81%.

Technology Category

Application Category

📝 Abstract
Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) to scale up the explicit reasoning process to achieve better performance. However, there are some key challenges for tool use in current RL-based scaling approaches: (a) direct RL training often struggles to scale up thinking length sufficiently to solve complex problems, and (b) scaled-up models tend to overthink simpler problems, resulting in substantial token inefficiency. To address these challenges, we propose a novel training paradigm that first employs warm-up supervised fine-tuning to help models distinguish between simple and complex problems, followed by RL that enable models to automatically determine appropriate reasoning trajectories. Furthermore, to tackle the issue of automatic thinking-length scaling, we discover that entropy-based optimization objectives effectively maintain model diversity while successfully unlocking the model's scaling capabilities. Based on this insight, we introduce an entropy-based long-short reasoning fusion RL strategy. Our experiments on three benchmarks demonstrate that model successfully achieves auto-scaling for efficient tool use, achieving significant 9.8\% accuracy improvements while reducing computational overhead by \textasciitilde81\%.
Problem

Research questions and friction points this paper is trying to address.

tool use
reinforcement learning
reasoning length
token inefficiency
scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-scaling
Entropy-based RL
Tool-use reasoning
Decoupled entropy constraints
Reasoning trajectory optimization
🔎 Similar Papers
No similar papers found.
Y
Yirong Zeng
Harbin Institute of Technology, SCIR Lib
Xiao Ding
Xiao Ding
Harbin Institute of Technology
Natural Language ProcessingArtificial Intelligence
Y
Yufei Liu
Peking University
Y
Yuxian Wang
Huawei Technologies Ltd.
Q
Qunyao Du
Harbin Institute of Technology, SCIR Lib
Yutai Hou
Yutai Hou
Huawei
LLMNLPDialogueAlignmentMeta Learning
W
Wu Ning
Huawei Technologies Ltd.
H
Haonan Song
Huawei Technologies Ltd.
Duyu Tang
Duyu Tang
Huawei
Natural Language Processing
D
Dandan Tu
Huawei Technologies Ltd.
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis
T
Ting Liu
Harbin Institute of Technology, SCIR Lib