Tracing the Representation Geometry of Language Models from Pretraining to Post-training

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing training metrics fail to explain the emergence of complex capabilities in large language models (LLMs). Method: We uncover a three-phase, non-monotonic evolution of representational geometry—from pretraining to post-training—characterized by initial collapse, entropy-driven expansion, and compression-driven convergence. Leveraging spectral analysis, we propose two geometric measures: effective rank (RankMe) and spectral decay rate (α-ReQ), quantifying this dynamic across OLMo and Pythia model families. Contribution/Results: We empirically demonstrate that geometric phase transitions strongly correlate with downstream performance; distinct post-training methods—supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning via value ranking (RLVR)—induce characteristic geometric shifts, respectively enhancing in-distribution accuracy, robustness, and generation diversity. Crucially, we establish the first theoretical framework unifying cross-entropy optimization and the representation bottleneck as joint drivers of geometric evolution, thereby enabling interpretable, mechanistic links between training dynamics and capability emergence.

Technology Category

Application Category

📝 Abstract

Standard training metrics like loss fail to explain the emergence of complex capabilities in large language models. We take a spectral approach to investigate the geometry of learned representations across pretraining and post-training, measuring effective rank (RankMe) and eigenspectrum decay ($α$-ReQ). With OLMo (1B-7B) and Pythia (160M-12B) models, we uncover a consistent non-monotonic sequence of three geometric phases during autoregressive pretraining. The initial "warmup" phase exhibits rapid representational collapse. This is followed by an "entropy-seeking" phase, where the manifold's dimensionality expands substantially, coinciding with peak n-gram memorization. Subsequently, a "compression-seeking" phase imposes anisotropic consolidation, selectively preserving variance along dominant eigendirections while contracting others, a transition marked with significant improvement in downstream task performance. We show these phases can emerge from a fundamental interplay of cross-entropy optimization under skewed token frequencies and representational bottlenecks ($d ll |V|$). Post-training further transforms geometry: SFT and DPO drive "entropy-seeking" dynamics to integrate specific instructional or preferential data, improving in-distribution performance while degrading out-of-distribution robustness. Conversely, RLVR induces "compression-seeking", enhancing reward alignment but reducing generation diversity.

Problem

Research questions and friction points this paper is trying to address.

Investigating geometric phases in language model representations during pretraining

Analyzing how post-training methods alter representation geometry and capabilities

Explaining emergent abilities through spectral analysis of representation manifolds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral analysis measures representation geometry changes

Three geometric phases emerge during autoregressive pretraining

Post-training methods alter geometry for specific objectives

🔎 Similar Papers

Emergence of a High-Dimensional Abstraction Phase in Language Transformers