Hallucination Detection in LLMs Using Spectral Features of Attention Maps

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the limited representational capacity of attention maps in hallucination detection for large language models (LLMs), this paper proposes LapEigvals: a lightweight supervised probe that treats intra-layer attention maps as weighted graphs and employs the top-k eigenvalues of their normalized Laplacian matrices as low-dimensional spectral representations. This work is the first to introduce spectral graph theory into LLM hallucination detection, overcoming representational bottlenecks inherent in conventional statistical or attention-visualization–based approaches. Evaluated across multiple benchmark datasets, LapEigvals achieves state-of-the-art performance, significantly outperforming existing attention-driven methods. Moreover, it demonstrates strong robustness and cross-model generalization capability. By providing interpretable, scalable, and theoretically grounded spectral signatures, LapEigvals establishes a novel paradigm for hallucination identification in safety-critical applications.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations. Detecting hallucinations is essential for safety-critical applications, and recent methods leverage attention map properties to this end, though their effectiveness remains limited. In this work, we investigate the spectral features of attention maps by interpreting them as adjacency matrices of graph structures. We propose the $ ext{LapEigvals}$ method, which utilises the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes. Empirical evaluations demonstrate that our approach achieves state-of-the-art hallucination detection performance among attention-based methods. Extensive ablation studies further highlight the robustness and generalisation of $ ext{LapEigvals}$, paving the way for future advancements in the hallucination detection domain.

Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in Large Language Models

Uses spectral features of attention maps

Proposes LapEigvals for improved detection accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral features of attention maps

Laplacian matrix eigenvalues

State-of-the-art hallucination detection

🔎 Similar Papers

LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation