Representation Alignment Rests on Linear Structure

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work investigates the origins of cross-model representation alignment and proposes the Linear Representation Hypothesis (LRH), which posits that linear structure is the core mechanism enabling such alignment. Building upon a statistical framework grounded in signal, model bias, and training noise, the study employs sparse autoencoders to extract object–attribute features and incorporates centering and normalization to enhance alignment performance. Experimental results demonstrate that sparse representations substantially outperform dense ones, while centering and normalization effectively mitigate model bias. Furthermore, a positive correlation between word frequency and alignment strength is observed, supporting the hypothesis that data scarcity induces representational noise.

📝 Abstract

We investigate the Platonic Representation Hypothesis (PRH) through a tripartite statistical framework of representations: signal, bias, and noise. {1) Signal:} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis (LRH). We provide evidence that LRH helps explain PRH by extracting linear object-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross-modal alignment than their dense counterparts. {2) Bias:} Models have different implicit biases due to the diverse architectures and training procedures used. We show that this difference can be partially mitigated. Centering and normalization consistently improve cross-model alignment. {3) Noise:} Finite-sample training leads to noise in representations. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures.

Problem

Research questions and friction points this paper is trying to address.

representation alignment

Platonic Representation Hypothesis

Linear Representation Hypothesis

cross-modal alignment

representational noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Representation Hypothesis

Sparse Autoencoders

Cross-modal Alignment