Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the critical challenge of hallucination generation by large language models (LLMs), which impedes their real-world deployment. We propose the first probabilistic hallucination detection framework grounded in distributional divergence—measured via Minkowski distance—in the embedding space of pre-trained models. We empirically discover and rigorously validate that hallucinated and factual texts exhibit statistically significant, scale-invariant shifts in their embedding-distance distributions. Leveraging this insight, we formulate a mathematically principled hypothesis-testing paradigm to quantify hallucination probability without fine-tuning or additional annotations. Evaluated on standard benchmarks, our method achieves 66% detection accuracy—matching state-of-the-art performance—while offering full interpretability and cross-model generalizability. This establishes a novel, theoretically grounded pathway for trustworthy LLM evaluation.

Technology Category

Application Category

📝 Abstract

Hallucinations are one of the major issues affecting LLMs, hindering their wide adoption in production systems. While current research solutions for detecting hallucinations are mainly based on heuristics, in this paper we introduce a mathematically sound methodology to reason about hallucination, and leverage it to build a tool to detect hallucinations. To the best of our knowledge, we are the first to show that hallucinated content has structural differences with respect to correct content. To prove this result, we resort to the Minkowski distances in the embedding space. Our findings demonstrate statistically significant differences in the embedding distance distributions, that are also scale free -- they qualitatively hold regardless of the distance norm used and the number of keywords, questions, or responses. We leverage these structural differences to develop a tool to detect hallucinated responses, achieving an accuracy of 66% for a specific configuration of system parameters -- comparable with the best results in the field. In conclusion, the suggested methodology is promising and novel, possibly paving the way for further research in the domain, also along the directions highlighted in our future work.

Problem

Research questions and friction points this paper is trying to address.

Detect hallucinations in LLMs

Use embeddings distance analysis

Develop novel detection tool

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minkowski distances analyze embeddings

Structural differences detect hallucinations

Probabilistic framework improves accuracy

🔎 Similar Papers

No similar papers found.

Authors to Follow