DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness

πŸ“… 2026-01-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a lightweight residual probe that directly reads hallucination risk in real time from the intermediate hidden states of question tokens during decoding, leveraging implicit uncertainty signals already present in large language models’ internal representations. By enabling zero-latency, low-overhead access to cognitive signals, the method functions as an agent critic to support selective generation and routing under parallel inference. Evaluated across multiple mainstream large language models and four question-answering benchmarks, the approach achieves strong performance in terms of AUROC and AURAC, demonstrating robust generalization. Furthermore, it reveals interpretable uncertainty structures within intermediate representations, offering a novel paradigm for building reliable AI agents.

Technology Category

Application Category

πŸ“ Abstract
LLMs often produce fluent but incorrect answers, yet detecting such hallucinations typically requires multiple sampling passes or post-hoc verification, adding significant latency and cost. We hypothesize that intermediate layers encode confidence signals that are lost in the final output layer, and propose a lightweight probe to read these signals directly from hidden states. The probe adds less than 0.1\% computational overhead and can run fully in parallel with generation, enabling hallucination detection before the answer is produced. Building on this, we develop an LLM router that answers confident queries immediately while delegating uncertain ones to stronger models. Despite its simplicity, our method achieves SOTA AUROC on 10 out of 12 settings across four QA benchmarks and three LLM families, with gains of up to 13 points over prior methods, and generalizes across dataset shifts without retraining.
Problem

Research questions and friction points this paper is trying to address.

hallucination
large language models
uncertainty
faithful readout
AI reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination detection
latent probing
uncertainty estimation
efficient inference
agentic AI
πŸ”Ž Similar Papers
No similar papers found.
R
Rohan Bhatnagar
Department of Computer Science, University of Maryland, College Park, MD, USA
Y
Youran Sun
Department of Mathematics, University of Maryland, College Park, MD, USA
C
Chi Andrew Zhang
Department of Statistics, University of Chicago, Chicago, IL, USA
Y
Yixin Wen
Department of Geography, University of Florida, Gainesville, FL, USA
Haizhao Yang
Haizhao Yang
Department of Mathematics, Department of Computer Science, University of Maryland College Park
Data sciencemachine learninghigh-performance computingnumerical linear algebraapplied and