Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of hallucination generation by large language models (LLMs), which impedes their real-world deployment. We propose the first probabilistic hallucination detection framework grounded in distributional divergence—measured via Minkowski distance—in the embedding space of pre-trained models. We empirically discover and rigorously validate that hallucinated and factual texts exhibit statistically significant, scale-invariant shifts in their embedding-distance distributions. Leveraging this insight, we formulate a mathematically principled hypothesis-testing paradigm to quantify hallucination probability without fine-tuning or additional annotations. Evaluated on standard benchmarks, our method achieves 66% detection accuracy—matching state-of-the-art performance—while offering full interpretability and cross-model generalizability. This establishes a novel, theoretically grounded pathway for trustworthy LLM evaluation.

Technology Category

Application Category

📝 Abstract
Hallucinations are one of the major issues affecting LLMs, hindering their wide adoption in production systems. While current research solutions for detecting hallucinations are mainly based on heuristics, in this paper we introduce a mathematically sound methodology to reason about hallucination, and leverage it to build a tool to detect hallucinations. To the best of our knowledge, we are the first to show that hallucinated content has structural differences with respect to correct content. To prove this result, we resort to the Minkowski distances in the embedding space. Our findings demonstrate statistically significant differences in the embedding distance distributions, that are also scale free -- they qualitatively hold regardless of the distance norm used and the number of keywords, questions, or responses. We leverage these structural differences to develop a tool to detect hallucinated responses, achieving an accuracy of 66% for a specific configuration of system parameters -- comparable with the best results in the field. In conclusion, the suggested methodology is promising and novel, possibly paving the way for further research in the domain, also along the directions highlighted in our future work.
Problem

Research questions and friction points this paper is trying to address.

Detect hallucinations in LLMs
Use embeddings distance analysis
Develop novel detection tool
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minkowski distances analyze embeddings
Structural differences detect hallucinations
Probabilistic framework improves accuracy
🔎 Similar Papers
No similar papers found.
E
Emanuele Ricco
King-Abdullah University of Science and Technology (KAUST), CEMSE division, Thuwal, Saudi Arabia
L
Lorenzo Cima
University of Pisa, Department of Information Engineering, Pisa, Italy; IIT-CNR, Pisa, Italy
Roberto Di Pietro
Roberto Di Pietro
IEEE Fellow; ACM Distinguished Scientist; Full Professor of Cybersecurity, KAUST
AI driven CybersecurityDistributed Systems SecurityWireless SecurityOSN Security and Privacy