Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work challenges the prevailing assumption that hallucinations stem from model uncertainty, revealing instead that large language models (LLMs) frequently generate factual errors with high confidence—even when possessing correct underlying knowledge—termed “high-certainty hallucinations.” Method: Through systematic knowledge probing and multi-dimensional uncertainty quantification—including logit entropy, prediction confidence, and sampling variance—the authors empirically evaluate this phenomenon across mainstream LLMs and benchmark datasets. Contribution/Results: The study demonstrates that high-certainty hallucinations are consistent, distinguishable, and largely undetectable by conventional uncertainty metrics. It establishes a novel causal mechanism for hallucination, introduces new diagnostic dimensions and evaluation benchmarks for hallucination detection and mitigation, and provides foundational theoretical insights for advancing LLM factuality and safety research.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) often generate outputs that lack grounding in real-world facts, a phenomenon known as hallucinations. Prior research has associated hallucinations with model uncertainty, leveraging this relationship for hallucination detection and mitigation. In this paper, we challenge the underlying assumption that all hallucinations are associated with uncertainty. Using knowledge detection and uncertainty measurement methods, we demonstrate that models can hallucinate with high certainty even when they have the correct knowledge. We further show that high-certainty hallucinations are consistent across models and datasets, distinctive enough to be singled out, and challenge existing mitigation methods. Our findings reveal an overlooked aspect of hallucinations, emphasizing the need to understand their origins and improve mitigation strategies to enhance LLM safety. The code is available at https://github.com/technion-cs-nlp/Trust_me_Im_wrong .

Problem

Research questions and friction points this paper is trying to address.

High-certainty hallucinations in LLMs

Challenges existing mitigation methods

Need for improved LLM safety strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-certainty hallucinations detection

Knowledge and uncertainty measurement

Cross-model consistency analysis

🔎 Similar Papers

No similar papers found.