Representation Alignment Rests on Linear Structure

๐Ÿ“… 2026-05-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the origins of cross-model representation alignment and proposes the Linear Representation Hypothesis (LRH), which posits that linear structure is the core mechanism enabling such alignment. Building upon a statistical framework grounded in signal, model bias, and training noise, the study employs sparse autoencoders to extract objectโ€“attribute features and incorporates centering and normalization to enhance alignment performance. Experimental results demonstrate that sparse representations substantially outperform dense ones, while centering and normalization effectively mitigate model bias. Furthermore, a positive correlation between word frequency and alignment strength is observed, supporting the hypothesis that data scarcity induces representational noise.
๐Ÿ“ Abstract
We investigate the Platonic Representation Hypothesis (PRH) through a tripartite statistical framework of representations: signal, bias, and noise. {1) Signal:} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis (LRH). We provide evidence that LRH helps explain PRH by extracting linear object-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross-modal alignment than their dense counterparts. {2) Bias:} Models have different implicit biases due to the diverse architectures and training procedures used. We show that this difference can be partially mitigated. Centering and normalization consistently improve cross-model alignment. {3) Noise:} Finite-sample training leads to noise in representations. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures.
Problem

Research questions and friction points this paper is trying to address.

representation alignment
Platonic Representation Hypothesis
Linear Representation Hypothesis
cross-modal alignment
representational noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Representation Hypothesis
Sparse Autoencoders
Cross-modal Alignment
Representation Bias
Representational Noise