🤖 AI Summary
To address the limited representational capacity of attention maps in hallucination detection for large language models (LLMs), this paper proposes LapEigvals: a lightweight supervised probe that treats intra-layer attention maps as weighted graphs and employs the top-k eigenvalues of their normalized Laplacian matrices as low-dimensional spectral representations. This work is the first to introduce spectral graph theory into LLM hallucination detection, overcoming representational bottlenecks inherent in conventional statistical or attention-visualization–based approaches. Evaluated across multiple benchmark datasets, LapEigvals achieves state-of-the-art performance, significantly outperforming existing attention-driven methods. Moreover, it demonstrates strong robustness and cross-model generalization capability. By providing interpretable, scalable, and theoretically grounded spectral signatures, LapEigvals establishes a novel paradigm for hallucination identification in safety-critical applications.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations. Detecting hallucinations is essential for safety-critical applications, and recent methods leverage attention map properties to this end, though their effectiveness remains limited. In this work, we investigate the spectral features of attention maps by interpreting them as adjacency matrices of graph structures. We propose the $ ext{LapEigvals}$ method, which utilises the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes. Empirical evaluations demonstrate that our approach achieves state-of-the-art hallucination detection performance among attention-based methods. Extensive ablation studies further highlight the robustness and generalisation of $ ext{LapEigvals}$, paving the way for future advancements in the hallucination detection domain.