Enhancing Hallucination Detection through Noise Injection

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) frequently generate “hallucinations”—semantically plausible yet factually incorrect outputs—posing significant risks to their safe deployment. To address this, we propose a novel hallucination detection paradigm grounded in Bayesian uncertainty estimation. Our method introduces targeted noise injection into model parameters or hidden-layer activations—replacing conventional token-level sampling—to enable more robust uncertainty modeling. Confidence is then quantified via dispersion analysis of the response distribution across multiple perturbed samples. Crucially, the approach requires no additional training or fine-tuning, ensuring zero computational overhead during inference. Extensive experiments across diverse LLM architectures (e.g., LLaMA, Mistral, Qwen) and standard hallucination benchmarks (e.g., TruthfulQA, HaluEval, FactScore) demonstrate consistent and substantial improvements in detection accuracy. The method is thus distinguished by its simplicity, architectural agnosticism, and computational efficiency—offering a practical, plug-and-play solution for reliable LLM trustworthiness assessment.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from a set of samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is sub-optimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling. We demonstrate its effectiveness across a wide range of datasets and model architectures.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in Large Language Models

Improving detection through noise injection

Enhancing safety in LLM deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise injection improves hallucination detection

Bayesian uncertainty enhances model reliability

Perturbing parameters boosts detection accuracy

🔎 Similar Papers

No similar papers found.