🤖 AI Summary
Large language models (LLMs) frequently generate hallucinated text, yet existing detection methods rely on external knowledge bases, supervised fine-tuning, or large-scale annotated data—and lack fine-grained hallucination categorization. This work introduces *hallucination probing*, a novel task that classifies hallucinations into three distinct types: *alignment*, *misplacement*, and *fabrication*—without requiring external knowledge or labeled supervision. Leveraging pronounced differences in internal activation patterns across model layers under critical entity perturbations, we propose SHINE: an unsupervised, zero-shot, and fine-tuning-free method that integrates input perturbation analysis, inter-layer activation modeling, and zero-sample pattern discrimination to enable both detection and fine-grained classification. Evaluated across four LLMs and four benchmark datasets, SHINE consistently outperforms seven state-of-the-art baselines in hallucination detection—achieving new SOTA performance—and is the first method to accurately distinguish all three hallucination types.
📝 Abstract
LLM hallucination, where unfaithful text is generated, presents a critical challenge for LLMs' practical applications. Current detection methods often resort to external knowledge, LLM fine-tuning, or supervised training with large hallucination-labeled datasets. Moreover, these approaches do not distinguish between different types of hallucinations, which is crucial for enhancing detection performance. To address such limitations, we introduce hallucination probing, a new task that classifies LLM-generated text into three categories: aligned, misaligned, and fabricated. Driven by our novel discovery that perturbing key entities in prompts affects LLM's generation of these three types of text differently, we propose SHINE, a novel hallucination probing method that does not require external knowledge, supervised training, or LLM fine-tuning. SHINE is effective in hallucination probing across three modern LLMs, and achieves state-of-the-art performance in hallucination detection, outperforming seven competing methods across four datasets and four LLMs, underscoring the importance of probing for accurate detection.