The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the limitation of existing single-layer scaling laws for sparse autoencoders (SAEs), which fail to account for the substantial variation in reconstruction error across different network layers. The authors propose a cross-layer SAE scaling analysis framework that, for the first time, reveals how the curvature and intrinsic dimensionality of activation manifolds jointly determine layer-specific width exponents, establishing a mapping between manifold geometric properties and layer-dependent scaling laws. Their approach employs a two-stage strategy: first fitting scaling law surfaces per layer, then regressing against four geometric summaries of the manifolds. Experiments on Gemma 2 (2B/9B models, totaling 68 layers and 844 SAEs) demonstrate that this geometric principle accurately predicts the lower bound of per-layer reconstruction error, with regression coefficients exhibiting strong generalization across models.
📝 Abstract
Sparse autoencoders (SAEs) operationalise the linear representation hypothesis: they reconstruct model activations as sparse linear combinations of interpretable dictionary atoms, on the implicit assumption that activation space is well approximated by a globally linear structure. Their reconstruction error varies sharply across layers in ways that existing scaling laws, fitted at single layers, do not explain. We argue that this variation is the empirical trace of a geometric mismatch: where the activation manifold is curved and its intrinsic dimension varies across layers, no sparse linear dictionary can match it uniformly, and the SAE's width-sparsity scaling becomes a layer-dependent function of manifold structure rather than a single universal law. We conduct the first cross-layer SAE scaling study, fitting and regressing on 844 residual-stream Gemma Scope SAE checkpoints across 68 layers of Gemma 2 2B and 9B. Stage 1 fits a per-layer scaling-law surface; Stage 2 regresses the fitted parameters and the derived per-layer width exponents on four layerwise geometric summaries. We find that manifold geometry predicts the per-layer width exponent in both models, and that the same regression coefficients learnt on one model predict the other model's per-layer exponents under cross-model transfer, indicating a transferable geometric law. At the showcase layers where richer width grids permit identification of the asymptotic floor, we find that the fitted floor tracks the layerwise geometric ordering: higher curvature and intrinsic dimension correspond to higher floor, consistent with the irreducible second-order residual that any sparse linear approximation of a curved manifold must leave behind. SAEs thus encounter not a finite-resource ceiling but a geometry-dependent wall, set by the manifold they are trying to reconstruct.
Problem

Research questions and friction points this paper is trying to address.

sparse autoencoders
scaling laws
manifold geometry
activation manifold
layerwise variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoders
manifold geometry
scaling laws
geometric wall
cross-layer analysis
🔎 Similar Papers
E
Eslam Zaher
ARC Training Centre for Information Resilience (CIRES); School of Mathematics and Physics, University of Queensland
M
Maciej Trzaskowski
ARC Training Centre for Information Resilience (CIRES); Institute for Molecular Bioscience, University of Queensland; Profenso
Q
Quan Nguyen
ARC Training Centre for Information Resilience (CIRES); Institute for Molecular Bioscience, University of Queensland; QIMR Berghofer Medical Research Institute
Fred Roosta
Fred Roosta
University of Queensland
Machine LearningNumerical OptimizationComputational StatisticsScientific Computing