PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

173K/year
🤖 AI Summary
This work addresses the performance degradation and resource inefficiency in large language models (LLMs) with tool-augmented reasoning, which often stem from erroneous tool invocations during inference. To mitigate this issue, we propose PruneTIR, a novel framework that dynamically intervenes in the tool-calling process at inference time without requiring additional training. PruneTIR integrates three key mechanisms—success-triggered pruning, stall-triggered pruning with resampling, and retry-triggered tool suspension—leveraging the LLM’s analysis of its own reasoning trajectory alongside heuristic strategies to prune incorrect calls, resample alternatives, or temporarily suspend problematic tools. Experimental results demonstrate that PruneTIR significantly improves Pass@1 accuracy, reduces context length consumption, and enhances overall inference efficiency.
📝 Abstract
Tool-integrated reasoning (TIR) enables large language models (LLMs) to enhance their capabilities by interacting with external tools, such as code interpreters (CI). Most recent studies focus on exploring various methods to equip LLMs with the ability to use tools. However, how to further boost the reasoning ability of already tool-capable LLMs at inference time remains underexplored. Improving reasoning at inference time requires no additional training and can help LLMs better leverage tools to solve problems. We observe that, during tool-capable LLM inference, both the number and the proportion of erroneous tool calls are negatively correlated with answer correctness. Moreover, erroneous tool calls are typically resolved successfully within a few subsequent turns. If not, LLMs often struggle to resolve such errors even with many additional turns. Building on the above observations, we propose PruneTIR, a rather effective yet efficient framework that enhances the tool-integrated reasoning at inference time. During LLM inference, PruneTIR prunes trajectories, resamples tool calls, and suspends tool usage through three components: Success-Triggered Pruning, Stuck-Triggered Pruning and Resampling, and Retry-Triggered Tool Suspension. These three components enable PruneTIR to mitigate the negative impact of erroneous tool calls and prevent LLMs from getting stuck in repeated failed resolution attempts, thereby improving overall LLM performance. Extensive experimental results demonstrate the effectiveness of PruneTIR, which significantly improves Pass@1 and efficiency while reducing the working context length for tool-capable LLMs.
Problem

Research questions and friction points this paper is trying to address.

tool-integrated reasoning
inference-time optimization
erroneous tool calls
large language models
reasoning efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tool-Integrated Reasoning
Inference-Time Pruning
Large Language Models
Tool Call Optimization
Trajectory Pruning
🔎 Similar Papers
L
Luan Zhang
School of Computer Science and Technology, Beijing Institute of Technology, China
D
Dandan Song
School of Computer Science and Technology, Beijing Institute of Technology, China
Zhijing Wu
Zhijing Wu
Beijing Institute of Technology
Information RetrievalNatural Language Processing
Z
Zhengyu Chen
Independent, China
Chen Zhang
Chen Zhang
Unknown affiliation
Y
Yuhang Tian
School of Computer Science and Technology, Beijing Institute of Technology, China
H
Huipeng Ma
School of Computer Science and Technology, Beijing Institute of Technology, China
Chenhao Li
Chenhao Li
ETH Zurich; Massachusetts Institute of Technology
Deep LearningReinforcement LearningRobotics
C
Changzhi Zhou
School of Computer Science and Technology, Beijing Institute of Technology, China
X
Xudong Li
School of Computer Science and Technology, Beijing Institute of Technology, China
Shuhao Zhang
Shuhao Zhang
Professor, School of Computer Science and Technology, Huazhong University of Science and Technology
Data Stream ProcessingParallel and Distributed Computing