HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the susceptibility of large language models (LLMs) to two distinct types of hallucinations—data-driven and reasoning-driven—in high-stakes domains such as healthcare and law. Existing detection methods are often limited to a single source of hallucination and exhibit poor generalization. To overcome these limitations, this study proposes the first unified theoretical framework for hallucination risk, decoupling it into data- and reasoning-related components. Building upon the Neural Tangent Kernel (NTK), the authors introduce HalluGuard, a scoring mechanism that leverages NTK-based geometric representations to enable joint hallucination detection without task-specific heuristics. Extensive evaluation across 10 benchmarks, 11 baselines, and 9 prominent LLMs demonstrates that HalluGuard consistently outperforms current state-of-the-art methods, significantly enhancing the identification of diverse hallucination patterns.

Technology Category

Application Category

📝 Abstract

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.

Problem

Research questions and friction points this paper is trying to address.

hallucination

large language models

data-driven hallucination

reasoning-driven hallucination

reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hallucination Risk Bound

Neural Tangent Kernel

Data-Driven Hallucination