Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

πŸ“… 2026-04-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

171K/year
πŸ€– AI Summary
This work investigates whether tool-augmented reasoning outperforms native chain-of-thought (CoT) under semantic interference and elucidates the underlying mechanisms of performance degradation. By employing a factorized intervention framework, the study disentangles the costs of prompt formatting, overhead from tool-calling protocols, and actual utility derived from tools, introducing for the first time the concept of a β€œtool usage tax” to quantify the performance penalty inherent to the protocol itself. To address this, the authors propose G-STEP, a lightweight gating mechanism that dynamically modulates tool invocation during reasoning. Experiments reveal that under semantic noise, the benefits of tool use often fail to offset protocol-induced overhead, resulting in performance inferior to native CoT. While G-STEP partially mitigates this issue, substantial improvement ultimately hinges on enhancing the model’s intrinsic reasoning capabilities.
πŸ“ Abstract
Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings suggest that more substantial improvements still require strengthening the model's intrinsic reasoning and tool-interaction capabilities.
Problem

Research questions and friction points this paper is trying to address.

tool-augmented reasoning
semantic distractors
tool-use tax
chain-of-thought
LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-use tax
Factorized Intervention Framework
tool-augmented reasoning
G-STEP
semantic distractors
πŸ”Ž Similar Papers