Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the limited generalization of existing hallucination detection methods in cross-domain settings. It presents the first systematic study of generalizable hallucination detection and reveals that hallucinated dialogues consistently exhibit pronounced spikes in uncertainty during multi-turn interactions. Building on this insight, the authors propose SpikeScore, an unsupervised metric that quantifies uncertainty fluctuations to detect hallucinations across domains. Theoretical analysis establishes the separability of this phenomenon, and a multi-turn dialogue simulation framework is developed for cross-domain evaluation. Extensive experiments demonstrate that SpikeScore significantly outperforms current methods across multiple large language models and benchmarks, achieving superior generalization and detection performance without requiring labeled data.

Technology Category

Application Category

📝 Abstract

Hallucination detection is critical for deploying large language models (LLMs) in real-world applications. Existing hallucination detection methods achieve strong performance when the training and test data come from the same domain, but they suffer from poor cross-domain generalization. In this paper, we study an important yet overlooked problem, termed generalizable hallucination detection (GHD), which aims to train hallucination detectors on data from a single domain while ensuring robust performance across diverse related domains. In studying GHD, we simulate multi-turn dialogues following LLMs'initial response and observe an interesting phenomenon: hallucination-initiated multi-turn dialogues universally exhibit larger uncertainty fluctuations than factual ones across different domains. Based on the phenomenon, we propose a new score SpikeScore, which quantifies abrupt fluctuations in multi-turn dialogues. Through both theoretical analysis and empirical validation, we demonstrate that SpikeScore achieves strong cross-domain separability between hallucinated and non-hallucinated responses. Experiments across multiple LLMs and benchmarks demonstrate that the SpikeScore-based detection method outperforms representative baselines in cross-domain generalization and surpasses advanced generalization-oriented methods, verifying the effectiveness of our method in cross-domain hallucination detection.

Problem

Research questions and friction points this paper is trying to address.

hallucination detection

cross-domain generalization

large language models

generalizable hallucination detection

out-of-domain

Innovation

Methods, ideas, or system contributions that make the work stand out.

SpikeScore

cross-domain hallucination detection

generalizable hallucination detection