MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factually inconsistent “hallucinations,” necessitating efficient, general-purpose unsupervised detection methods. To address this, we propose a novel hallucination detection framework based on trajectory shape analysis of the Maximum Mean Discrepancy (MMD) computed across token sequences sampled at varying softmax temperatures. Specifically, we derive an MMD sequence characterizing the implicit distributional shift and analyze its geometric properties—such as curvature and monotonicity—to discriminate hallucinated outputs. This is the first approach to treat MMD trajectory morphology as the primary discriminative signal, requiring no fine-tuning, human annotations, or reference texts, while offering strong interpretability and black-box compatibility. Evaluated on two machine translation benchmarks, our method substantially outperforms existing unsupervised baselines in detection accuracy, with low computational overhead. It establishes a lightweight, generalizable paradigm for trustworthy LLM evaluation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have become pervasive in our everyday life. Yet, a fundamental obstacle prevents their use in many critical applications: their propensity to generate fluent, human-quality content that is not grounded in reality. The detection of such hallucinations is thus of the highest importance. In this work, we propose a new method to flag hallucinated content, MMD-Flagger. It relies on Maximum Mean Discrepancy (MMD), a non-parametric distance between distributions. On a high-level perspective, MMD-Flagger tracks the MMD between the generated documents and documents generated with various temperature parameters. We show empirically that inspecting the shape of this trajectory is sufficient to detect most hallucinations. This novel method is benchmarked on two machine translation datasets, on which it outperforms natural competitors.

Problem

Research questions and friction points this paper is trying to address.

Detect hallucinations in large language model outputs

Measure distribution discrepancy using Maximum Mean Discrepancy

Flag unrealistic content via temperature-based trajectory analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Maximum Mean Discrepancy for detection

Tracks MMD across temperature variations

Outperforms competitors on translation datasets

🔎 Similar Papers

No similar papers found.