The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limitation of existing single-layer scaling laws for sparse autoencoders (SAEs), which fail to account for the substantial variation in reconstruction error across different network layers. The authors propose a cross-layer SAE scaling analysis framework that, for the first time, reveals how the curvature and intrinsic dimensionality of activation manifolds jointly determine layer-specific width exponents, establishing a mapping between manifold geometric properties and layer-dependent scaling laws. Their approach employs a two-stage strategy: first fitting scaling law surfaces per layer, then regressing against four geometric summaries of the manifolds. Experiments on Gemma 2 (2B/9B models, totaling 68 layers and 844 SAEs) demonstrate that this geometric principle accurately predicts the lower bound of per-layer reconstruction error, with regression coefficients exhibiting strong generalization across models.

📝 Abstract

Sparse autoencoders (SAEs) operationalise the linear representation hypothesis: they reconstruct model activations as sparse linear combinations of interpretable dictionary atoms, on the implicit assumption that activation space is well approximated by a globally linear structure. Their reconstruction error varies sharply across layers in ways that existing scaling laws, fitted at single layers, do not explain. We argue that this variation is the empirical trace of a geometric mismatch: where the activation manifold is curved and its intrinsic dimension varies across layers, no sparse linear dictionary can match it uniformly, and the SAE's width-sparsity scaling becomes a layer-dependent function of manifold structure rather than a single universal law. We conduct the first cross-layer SAE scaling study, fitting and regressing on 844 residual-stream Gemma Scope SAE checkpoints across 68 layers of Gemma 2 2B and 9B. Stage 1 fits a per-layer scaling-law surface; Stage 2 regresses the fitted parameters and the derived per-layer width exponents on four layerwise geometric summaries. We find that manifold geometry predicts the per-layer width exponent in both models, and that the same regression coefficients learnt on one model predict the other model's per-layer exponents under cross-model transfer, indicating a transferable geometric law. At the showcase layers where richer width grids permit identification of the asymptotic floor, we find that the fitted floor tracks the layerwise geometric ordering: higher curvature and intrinsic dimension correspond to higher floor, consistent with the irreducible second-order residual that any sparse linear approximation of a curved manifold must leave behind. SAEs thus encounter not a finite-resource ceiling but a geometry-dependent wall, set by the manifold they are trying to reconstruct.

Problem

Research questions and friction points this paper is trying to address.

sparse autoencoders

scaling laws

manifold geometry

activation manifold

layerwise variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoders

manifold geometry

scaling laws