🤖 AI Summary
This work challenges the prevailing assumption that hallucinations stem from model uncertainty, revealing instead that large language models (LLMs) frequently generate factual errors with high confidence—even when possessing correct underlying knowledge—termed “high-certainty hallucinations.” Method: Through systematic knowledge probing and multi-dimensional uncertainty quantification—including logit entropy, prediction confidence, and sampling variance—the authors empirically evaluate this phenomenon across mainstream LLMs and benchmark datasets. Contribution/Results: The study demonstrates that high-certainty hallucinations are consistent, distinguishable, and largely undetectable by conventional uncertainty metrics. It establishes a novel causal mechanism for hallucination, introduces new diagnostic dimensions and evaluation benchmarks for hallucination detection and mitigation, and provides foundational theoretical insights for advancing LLM factuality and safety research.
📝 Abstract
Large Language Models (LLMs) often generate outputs that lack grounding in real-world facts, a phenomenon known as hallucinations. Prior research has associated hallucinations with model uncertainty, leveraging this relationship for hallucination detection and mitigation. In this paper, we challenge the underlying assumption that all hallucinations are associated with uncertainty. Using knowledge detection and uncertainty measurement methods, we demonstrate that models can hallucinate with high certainty even when they have the correct knowledge. We further show that high-certainty hallucinations are consistent across models and datasets, distinctive enough to be singled out, and challenge existing mitigation methods. Our findings reveal an overlooked aspect of hallucinations, emphasizing the need to understand their origins and improve mitigation strategies to enhance LLM safety. The code is available at https://github.com/technion-cs-nlp/Trust_me_Im_wrong .