🤖 AI Summary
Large language models (LLMs) exhibit robust linear correlations in knowledge composition—semantic transformations (e.g., “city → country”) correspond to approximately linear mappings in the logits space; this property resembles human cognition but induces hallucinations when deviating from ground-truth relations.
Method: The authors systematically discover, quantify, and validate this linear structure via logits-space modeling, single-layer feedforward network fitting, and interpretability analysis of pretrained token embeddings—demonstrating its stability even after large-scale fine-tuning.
Contributions/Results: (1) They propose linear correlation as an interpretable criterion for generalization capability; (2) they empirically confirm that pretrained vocabulary representations are the primary driver of linear generalization; and (3) they establish a quantitative link between linear deviation magnitude and hallucination severity, substantially improving predictability of generalization behavior.
📝 Abstract
The generalization of language models (LMs) is undergoing active debates, contrasting their potential for general intelligence with their struggles with basic knowledge composition (e.g., reverse/transition curse). This paper uncovers the phenomenon of linear correlations in LMs during knowledge composition. For explanation, there exists a linear transformation between certain related knowledge that maps the next token prediction logits from one prompt to another, e.g.,"X lives in the city of"$
ightarrow$"X lives in the country of"for every given X. This mirrors the linearity in human knowledge composition, such as Paris $
ightarrow$ France. Our findings indicate that the linear transformation is resilient to large-scale fine-tuning, generalizing updated knowledge when aligned with real-world relationships, but causing hallucinations when it deviates. Empirical results suggest that linear correlation can serve as a potential identifier of LM's generalization. Finally, we show such linear correlations can be learned with a single feedforward network and pre-trained vocabulary representations, indicating LM generalization heavily relies on the latter.