🤖 AI Summary
Large language models frequently generate hallucinations during reasoning—responses lacking factual grounding—yet the underlying mechanisms remain poorly understood. This work models the reasoning process of decoder-only Transformers as a search over a graph, distinguishing context-dependent intrinsic reasoning from memory-structure-driven extrinsic reasoning. For the first time, it unifies the two primary causes of hallucination through the lens of graph structure evolution: early in training, reused memory paths override contextual constraints, while later, high-frequency multi-hop paths are compressed into shortcut edges. This framework not only elucidates the generative mechanisms of hallucination but also links them to downstream task behaviors, offering a theoretical foundation for both understanding and mitigating such phenomena.
📝 Abstract
Reasoning hallucinations in large language models (LLMs) often appear as fluent yet unsupported conclusions that violate either the given context or underlying factual knowledge. Although such failures are widely observed, the mechanisms by which decoder-only Transformers produce them remain poorly understood. We model next-token prediction as a graph search process over an underlying graph, where entities correspond to nodes and learned transitions form edges. From this perspective, contextual reasoning is a constrained search over a sampled subgraph (intrinsic reasoning), while context-free queries rely on memorized structures in the underlying graph (extrinsic reasoning). We show that reasoning hallucinations arise from two fundamental mechanisms: \textbf{Path Reuse}, where memorized knowledge overrides contextual constraints during early training, and \textbf{Path Compression}, where frequently traversed multi-step paths collapse into shortcut edges in later training. Together, these mechanisms provide a unified explanation for reasoning hallucinations in LLMs and connected to well-known behaviors observed in downstream applications.