From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Large language models are prone to generating hard-to-detect hallucinations in reasoning tasks, compromising their safety and reliability. This work reframes hallucination detection as an out-of-distribution (OOD) detection problem for the first time, modeling next-token prediction as a classification task. The authors propose a training-free, single-sample OOD detection method that seamlessly adapts to the architecture of language models. Leveraging a geometric perspective, the approach designs a novel OOD criterion that significantly outperforms existing techniques in reasoning scenarios, achieving high-accuracy hallucination identification. By doing so, it establishes a scalable new paradigm for enhancing the safety assurances of large language models.

Technology Category

Application Category

📝 Abstract

Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

Problem

Research questions and friction points this paper is trying to address.

hallucination detection

large language models

reasoning tasks

out-of-distribution detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination detection

out-of-distribution detection

large language models

reasoning tasks