Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Large language models (LLMs) frequently generate hallucinations, yet the internal mechanisms underlying their truthfulness judgments remain poorly understood. This work presents the first evidence that LLMs employ two distinct and complementary pathways for assessing factual consistency: a question-anchored pathway that relies on contextual cues from the input query, and an answer-anchored pathway that evaluates self-generated content based on its internal evidential support. These pathways are closely tied to the model’s knowledge boundaries and can be clearly disentangled through internal representation analysis. Leveraging this dual-path architecture, the authors develop a hallucination detection method that substantially outperforms existing approaches. The findings offer a novel foundation for building more reliable and self-aware generative systems capable of monitoring their own output fidelity.

Technology Category

Application Category

📝 Abstract

Despite their impressive capabilities, large language models (LLMs) frequently generate hallucinations. Previous work shows that their internal states encode rich signals of truthfulness, yet the origins and mechanisms of these signals remain unclear. In this paper, we demonstrate that truthfulness cues arise from two distinct information pathways: (1) a Question-Anchored pathway that depends on question-answer information flow, and (2) an Answer-Anchored pathway that derives self-contained evidence from the generated answer itself. First, we validate and disentangle these pathways through attention knockout and token patching. Afterwards, we uncover notable and intriguing properties of these two mechanisms. Further experiments reveal that (1) the two mechanisms are closely associated with LLM knowledge boundaries; and (2) internal representations are aware of their distinctions. Finally, building on these insightful findings, two applications are proposed to enhance hallucination detection performance. Overall, our work provides new insight into how LLMs internally encode truthfulness, offering directions for more reliable and self-aware generative systems.

Problem

Research questions and friction points this paper is trying to address.

LLM hallucinations

truthfulness encoding

information pathways

knowledge boundaries

internal representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

truthfulness encoding

hallucination detection

information pathways