🤖 AI Summary
This work investigates the optimal scaling laws for shallow neural networks with limited representational capacity when learning hierarchical multi-index targets. By integrating information-theoretic analysis, spectral estimation, and the small-learning-rate limit of gradient descent, the study rigorously derives, for the first time, precise scaling laws governing both subspace recovery and prediction error. These laws reveal a phase-transition-like process in which features are acquired sequentially according to their hierarchical level, giving rise to learning plateaus. Building on this insight, the authors propose a spectral estimator that requires no prior knowledge of the target structure yet achieves statistical optimality in theory. The work establishes a unified theoretical framework linking scaling laws, plateau phenomena, and spectral structure in the context of hierarchical target learning.
📝 Abstract
In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.