Optimal scaling laws in learning hierarchical multi-index models

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the optimal scaling laws for shallow neural networks with limited representational capacity when learning hierarchical multi-index targets. By integrating information-theoretic analysis, spectral estimation, and the small-learning-rate limit of gradient descent, the study rigorously derives, for the first time, precise scaling laws governing both subspace recovery and prediction error. These laws reveal a phase-transition-like process in which features are acquired sequentially according to their hierarchical level, giving rise to learning plateaus. Building on this insight, the authors propose a spectral estimator that requires no prior knowledge of the target structure yet achieves statistical optimality in theory. The work establishes a unified theoretical framework linking scaling laws, plateau phenomena, and spectral structure in the context of hierarchical target learning.

Technology Category

Application Category

📝 Abstract
In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.
Problem

Research questions and friction points this paper is trying to address.

scaling laws
hierarchical multi-index models
representation-limited regime
subspace recovery
prediction error
Innovation

Methods, ideas, or system contributions that make the work stand out.

scaling laws
hierarchical multi-index models
spectral estimator
phase transitions
representation learning
🔎 Similar Papers
2024-10-10International Conference on Machine LearningCitations: 3