Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

In fine-tuning large language models (LLMs), knowledge injection exhibits a paradoxical coexistence of generalization and hallucination, whose underlying mechanism remains unclear. Method: We propose “Out-of-Context Reasoning” (OCR) as a unified explanatory framework: LLMs perform cross-context inference via conceptual associations—regardless of causal validity—enabled by structural properties of attention and parameter updates. We formalize a synthetic fact-recall benchmark, theoretically link OCR capability to the implicit nuclear-norm preference of gradient descent acting on matrix-decomposed output and value projections, and instantiate the analysis in a single-layer, single-head attention model. Contribution/Results: Empirical validation across five mainstream LLMs confirms that OCR is the shared root cause of both generalization and hallucination; factorized output/value matrix structure is necessary for OCR learning; and the theory offers strong interpretability, enabling principled modeling and controllable intervention in knowledge injection.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.

Problem

Research questions and friction points this paper is trying to address.

Understanding duality of generalization and hallucination in LLMs

Exploring out-of-context reasoning as root cause

Analyzing role of matrix factorization in OCR

Innovation

Methods, ideas, or system contributions that make the work stand out.

Out-of-context reasoning drives generalization and hallucination

Matrix factorization enables synthetic factual recall

Gradient descent minimizes nuclear norm for OCR

🔎 Similar Papers

Racing Thoughts: Explaining Large Language Model Contextualization Errors