HARP: Hallucination Detection via Reasoning Subspace Projection

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of semantic-reasoning entanglement and insufficient robustness in large language model (LLM) hallucination detection, this paper proposes a novel detection framework based on reasoning subspace projection. Methodologically, we theoretically establish and empirically validate— for the first time—that the LLM hidden state space admits an orthogonal direct-sum decomposition into semantic and reasoning subspaces. Leveraging singular value decomposition (SVD) of the unembedding layer parameters, we extract an orthonormal basis for the reasoning subspace and project hidden states onto it, yielding compact, highly discriminative reasoning features—reducing dimensionality to just 5% of the original. Evaluated on multiple benchmarks including TriviaQA, our method achieves an AUROC of 92.8%, outperforming the state of the art by 7.5 percentage points, with marked improvements in both detection accuracy and noise robustness.

Technology Category

Application Category

📝 Abstract
Hallucinations in Large Language Models (LLMs) pose a major barrier to their reliable use in critical decision-making. Although existing hallucination detection methods have improved accuracy, they still struggle with disentangling semantic and reasoning information and maintaining robustness. To address these challenges, we propose HARP (Hallucination detection via reasoning subspace projection), a novel hallucination detection framework. HARP establishes that the hidden state space of LLMs can be decomposed into a direct sum of a semantic subspace and a reasoning subspace, where the former encodes linguistic expression and the latter captures internal reasoning processes. Moreover, we demonstrate that the Unembedding layer can disentangle these subspaces, and by applying Singular Value Decomposition (SVD) to its parameters, the basis vectors spanning the semantic and reasoning subspaces are obtained. Finally, HARP projects hidden states onto the basis vectors of the reasoning subspace, and the resulting projections are then used as input features for hallucination detection in LLMs. By using these projections, HARP reduces the dimension of the feature to approximately 5% of the original, filters out most noise, and achieves enhanced robustness. Experiments across multiple datasets show that HARP achieves state-of-the-art hallucination detection performance; in particular, it achieves an AUROC of 92.8% on TriviaQA, outperforming the previous best method by 7.5%.
Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in Large Language Models
Disentangling semantic and reasoning information
Enhancing robustness in hallucination detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes hidden states into semantic and reasoning subspaces
Uses SVD on unembedding layer to obtain basis vectors
Projects hidden states onto reasoning subspace for detection
🔎 Similar Papers
No similar papers found.
J
Junjie Hu
School of Computer Science and Technology, Huazhong University of Science and Technology
G
Gang Tu
School of Computer Science and Technology, Huazhong University of Science and Technology
S
ShengYu Cheng
School of Computer Science and Technology, Huazhong University of Science and Technology
Jinxin Li
Jinxin Li
Stanford University
RoboticsDroneSoft RobotsControl
Jinting Wang
Jinting Wang
Central University of Finance and Economics
Operations ManagementService ScienceQueueing TheoryReliabilityStochastic Modeling
R
Rui Chen
School of Computer Science and Technology, Huazhong University of Science and Technology
Z
Zhilong Zhou
School of Computer Science and Technology, Huazhong University of Science and Technology
D
Dongbo Shan
School of Computer Science and Technology, Huazhong University of Science and Technology