Hallucination Detection in LLMs Using Spectral Features of Attention Maps

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited representational capacity of attention maps in hallucination detection for large language models (LLMs), this paper proposes LapEigvals: a lightweight supervised probe that treats intra-layer attention maps as weighted graphs and employs the top-k eigenvalues of their normalized Laplacian matrices as low-dimensional spectral representations. This work is the first to introduce spectral graph theory into LLM hallucination detection, overcoming representational bottlenecks inherent in conventional statistical or attention-visualization–based approaches. Evaluated across multiple benchmark datasets, LapEigvals achieves state-of-the-art performance, significantly outperforming existing attention-driven methods. Moreover, it demonstrates strong robustness and cross-model generalization capability. By providing interpretable, scalable, and theoretically grounded spectral signatures, LapEigvals establishes a novel paradigm for hallucination identification in safety-critical applications.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations. Detecting hallucinations is essential for safety-critical applications, and recent methods leverage attention map properties to this end, though their effectiveness remains limited. In this work, we investigate the spectral features of attention maps by interpreting them as adjacency matrices of graph structures. We propose the $ ext{LapEigvals}$ method, which utilises the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes. Empirical evaluations demonstrate that our approach achieves state-of-the-art hallucination detection performance among attention-based methods. Extensive ablation studies further highlight the robustness and generalisation of $ ext{LapEigvals}$, paving the way for future advancements in the hallucination detection domain.
Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in Large Language Models
Uses spectral features of attention maps
Proposes LapEigvals for improved detection accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral features of attention maps
Laplacian matrix eigenvalues
State-of-the-art hallucination detection
🔎 Similar Papers
No similar papers found.