The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates whether enhancing the reasoning capabilities of large language models (LLMs) exacerbates tool hallucination—i.e., erroneous or unsupported tool invocations. Method: We introduce SimpleToolHalluBench, the first dedicated diagnostic benchmark to systematically evaluate how reasoning-enhancement techniques—including reinforcement learning, supervised fine-tuning, prompt engineering, and direct preference optimization—affect tool-call reliability. Contribution/Results: Our experiments provide the first empirical evidence that improved reasoning systematically induces tool hallucination across training paradigms, task types, and reasoning modes. Mechanistic analysis identifies representational collapse in late residual streams as a key causal factor. Crucially, we uncover an inherent trade-off between reasoning capability and tool-use reliability: all existing mitigation strategies incur substantial practical utility loss. These findings deliver both theoretical insight and empirical grounding for developing trustworthy AI tool usage.

Technology Category

Application Category

📝 Abstract

Enhancing the reasoning capabilities of Large Language Models (LLMs) is a key strategy for building Agents that "think then act." However, recent observations, like OpenAI's o3, suggest a paradox: stronger reasoning often coincides with increased hallucination, yet no prior work has systematically examined whether reasoning enhancement itself causes tool hallucination. To address this gap, we pose the central question: Does strengthening reasoning increase tool hallucination? To answer this, we introduce SimpleToolHalluBench, a diagnostic benchmark measuring tool hallucination in two failure modes: (i) no tool available, and (ii) only distractor tools available. Through controlled experiments, we establish three key findings. First, we demonstrate a causal relationship: progressively enhancing reasoning through RL increases tool hallucination proportionally with task performance gains. Second, this effect transcends overfitting - training on non-tool tasks (e.g., mathematics) still amplifies subsequent tool hallucination. Third, the effect is method-agnostic, appearing when reasoning is instilled via supervised fine-tuning and when it is merely elicited at inference by switching from direct answers to step-by-step thinking. We also evaluate mitigation strategies including Prompt Engineering and Direct Preference Optimization (DPO), revealing a fundamental reliability-capability trade-off: reducing hallucination consistently degrades utility. Mechanistically, Reasoning RL disproportionately collapses tool-reliability-related representations, and hallucinations surface as amplified divergences concentrated in late-layer residual streams. These findings reveal that current reasoning enhancement methods inherently amplify tool hallucination, highlighting the need for new training objectives that jointly optimize for capability and reliability.

Problem

Research questions and friction points this paper is trying to address.

Investigates whether enhanced reasoning increases tool hallucination in LLMs

Establishes causal link between reasoning improvement and hallucination growth

Reveals reliability-capability trade-off in current reasoning enhancement methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SimpleToolHalluBench diagnostic benchmark for tool hallucination

Establishes causal link between reasoning enhancement and tool hallucination

Identifies method-agnostic reliability-capability trade-off in reasoning methods

🔎 Similar Papers

Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models