D$^2$HScore: Reasoning-Aware Hallucination Detection via Semantic Breadth and Depth Analysis in LLMs

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factually incorrect content—so-called hallucinations—hindering their deployment in high-stakes domains such as finance and healthcare. To address this, we propose D²HScore, a training-free, annotation-free hallucination detection framework. Unlike prior approaches, D²HScore operates entirely on internal model dynamics: it jointly quantifies intra-layer semantic dispersion and inter-layer semantic drift across LLM transformer layers, while leveraging attention weights to identify conceptually critical tokens—enabling highly interpretable detection. Grounded in the intrinsic multi-layer architecture and autoregressive decoding behavior of LLMs, D²HScore requires no external data, fine-tuning, or supervision. Extensive evaluation across five open-source LLMs and five standard hallucination benchmarks demonstrates that D²HScore consistently outperforms existing unsupervised baselines, exhibits strong cross-model generalization, and enables plug-and-play deployment—establishing a novel, practical paradigm for trustworthy LLM evaluation.

Technology Category

Application Category

📝 Abstract

Although large Language Models (LLMs) have achieved remarkable success, their practical application is often hindered by the generation of non-factual content, which is called "hallucination". Ensuring the reliability of LLMs' outputs is a critical challenge, particularly in high-stakes domains such as finance, security, and healthcare. In this work, we revisit hallucination detection from the perspective of model architecture and generation dynamics. Leveraging the multi-layer structure and autoregressive decoding process of LLMs, we decompose hallucination signals into two complementary dimensions: the semantic breadth of token representations within each layer, and the semantic depth of core concepts as they evolve across layers. Based on this insight, we propose extbf{D$^2$HScore (Dispersion and Drift-based Hallucination Score)}, a training-free and label-free framework that jointly measures: (1) extbf{Intra-Layer Dispersion}, which quantifies the semantic diversity of token representations within each layer; and (2) extbf{Inter-Layer Drift}, which tracks the progressive transformation of key token representations across layers. To ensure drift reflects the evolution of meaningful semantics rather than noisy or redundant tokens, we guide token selection using attention signals. By capturing both the horizontal and vertical dynamics of representation during inference, D$^2$HScore provides an interpretable and lightweight proxy for hallucination detection. Extensive experiments across five open-source LLMs and five widely used benchmarks demonstrate that D$^2$HScore consistently outperforms existing training-free baselines.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinated content in large language models

Analyzing semantic breadth and depth during model generation

Providing training-free framework for reliable output evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-layer dispersion for semantic breadth analysis

Inter-layer drift tracking semantic depth evolution

Attention-guided token selection to reduce noise

🔎 Similar Papers

AutoHall: Automated Hallucination Dataset Generation for Large Language Models