Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Large language models (LLMs) suffer from hallucination, and while chain-of-thought (CoT) prompting reduces hallucination incidence by 12–28%, it concurrently degrades the performance of mainstream hallucination detection methods—reducing F1 scores by 9–34%. Method: This work empirically investigates how CoT—across zero-shot, few-shot, and self-consistency variants—affects hallucination characteristics in both instruction-tuned and inference-optimized LLMs, analyzing shifts in hallucination score distributions, detection accuracy, and output confidence. Contribution/Results: We reveal that CoT fundamentally compromises detectability by smoothing anomalous confidence peaks via distortion of internal model states and output probability distributions—thereby improving reasoning quality at the expense of diagnostic signal fidelity. This establishes an inherent trade-off between reasoning enhancement and hallucination detectability. Our findings provide critical insights for trustworthy LLM evaluation and robust detection method design. Code is publicly available.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) often exhibit extit{hallucinations}, generating factually incorrect or semantically irrelevant content in response to prompts. Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning, but its impact on hallucination detection remains underexplored. To bridge this gap, we conduct a systematic empirical evaluation. We begin with a pilot experiment, revealing that CoT reasoning significantly affects the LLM's internal states and token probability distributions. Building on this, we evaluate the impact of various CoT prompting methods on mainstream hallucination detection methods across both instruction-tuned and reasoning-oriented LLMs. Specifically, we examine three key dimensions: changes in hallucination score distributions, variations in detection accuracy, and shifts in detection confidence. Our findings show that while CoT prompting helps reduce hallucination frequency, it also tends to obscure critical signals used for detection, impairing the effectiveness of various detection methods. Our study highlights an overlooked trade-off in the use of reasoning. Code is publicly available at: https://anonymous.4open.science/r/cot-hallu-detect.

Problem

Research questions and friction points this paper is trying to address.

CoT prompting affects hallucination detection in LLMs

CoT obscures signals for detecting hallucinations

Trade-off between CoT reasoning and detection accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates CoT prompting's impact on hallucination detection

Analyzes internal states and token probability changes

Identifies trade-off between reasoning and detection signals

🔎 Similar Papers

LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation