Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of hallucination in large language models, which undermines their reliability in high-stakes applications. The authors frame hallucination detection as a statistical hypothesis test and introduce Calibrated Entropy Score (CES), a lightweight method requiring only a single forward pass and black-box access. CES uniquely leverages the global shape and tail characteristics of token-level entropy distributions as a hallucination fingerprint, combining mean and maximum signals. It provides finite-sample calibration guarantees via a novel randomized-length Dvoretzky–Kiefer–Wolfowitz inequality, achieving exponentially convergent detection success with respect to generation length. Evaluated across eight question-answering benchmarks and ten models, CES outperforms all existing single-pass black-box methods and matches the performance of costly multi-sampling approaches, while offering— for the first time—formal error control.
📝 Abstract
Large Language Models (LLMs) often generate factually incorrect outputs, commonly termed hallucinations, that undermine trust and limit deployment in high-stakes settings. Existing hallucination detection methods typically require multiple forward passes, or access to model internals. In this work, we provide theoretical background and empirical evidence that the distribution of token-level entropies, beyond the mean captured by perplexity or length-normalised entropy, serves as a fingerprint of hallucination, with distributional shape and tail behaviour carrying independent signal. We formalize hallucination detection as a statistical hypothesis test and propose the Calibrated Entropy Score (CES), a lightweight algorithm requiring only a single forward pass and black-box access to token logits. CES combines the mean signal with the maximum signal of the generated entropy through a calibrated reference CDF, producing scores that are directly comparable across models and tasks. We establish finite-sample calibration guarantees via a novel random-length Dvoretzky--Kiefer--Wolfowitz inequality, and also prove that CES detects hallucinations with probability converging to one exponentially fast in the generation length. Across eight QA benchmarks and ten generator models spanning open-source and API access models, CES achieves the highest detection performance among all single-pass black-box methods while providing formal error guarantees that existing heuristics lack. Remarkably, CES is statistically indistinguishable from multi-sample methods that require far greater computational cost, closing the gap between lightweight and expensive detection and making it suitable for real-time, large-scale deployment.
Problem

Research questions and friction points this paper is trying to address.

hallucination detection
large language models
entropy distribution
black-box access
factual correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibrated Entropy Score
entropy distribution
hallucination detection
black-box method
statistical hypothesis test
🔎 Similar Papers
No similar papers found.