Phase Transitions for Feature Learning in Neural Networks

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how two-layer neural networks extract low-dimensional latent features in high-dimensional multi-index learning and identifies the sample complexity threshold at which feature learning becomes feasible. Under the proportional asymptotic regime where both the sample size \(n\) and input dimension \(d\) tend to infinity with \(n/d \to \delta\), the authors combine high-dimensional statistical analysis, random matrix theory, and gradient flow dynamics to precisely characterize, for the first time, the phase transition threshold \(\delta_{\text{NN}}\) that governs the onset of feature learning. Theoretical analysis reveals that this threshold is dictated by the negative eigen-directions of the Hessian of the empirical risk in late-stage training, thereby uncovering an intrinsic connection among network architecture, training dynamics, and learnability, and providing a rigorous theoretical foundation for understanding feature learning mechanisms in neural networks.

Technology Category

Application Category

📝 Abstract
According to a popular viewpoint, neural networks learn from data by first identifying low-dimensional representations, and subsequently fitting the best model in this space. Recent works provide a formalization of this phenomenon when learning multi-index models. In this setting, we are given $n$ i.i.d. pairs $({\boldsymbol x}_i,y_i)$, where the covariate vectors ${\boldsymbol x}_i\in\mathbb{R}^d$ are isotropic, and responses $y_i$ only depend on ${\boldsymbol x}_i$ through a $k$-dimensional projection ${\boldsymbol \Theta}_*^{{\sf T}}{\boldsymbol x}_i$. Feature learning amounts to learning the latent space spanned by ${\boldsymbol \Theta}_*$. In this context, we study the gradient descent dynamics of two-layer neural networks under the proportional asymptotics $n,d\to\infty$, $n/d\to\delta$, while the dimension of the latent space $k$ and the number of hidden neurons $m$ are kept fixed. Earlier work establishes that feature learning via polynomial-time algorithms is possible if $\delta>\delta_{\text{alg}}$, for $\delta_{\text{alg}}$ a threshold depending on the data distribution, and is impossible (within a certain class of algorithms) below $\delta_{\text{alg}}$. Here we derive an analogous threshold $\delta_{\text{NN}}$ for two-layer networks. Our characterization of $\delta_{\text{NN}}$ opens the way to study the dependence of learning dynamics on the network architecture and training algorithm. The threshold $\delta_{\text{NN}}$ is determined by the following scenario. Training first visits points for which the gradient of the empirical risk is large and learns the directions spanned by these gradients. Then the gradient becomes smaller and the dynamics becomes dominated by negative directions of the Hessian. The threshold $\delta_{\text{NN}}$ corresponds to a phase transition in the spectrum of the Hessian in this second phase.
Problem

Research questions and friction points this paper is trying to address.

feature learning
phase transitions
neural networks
multi-index models
gradient descent
Innovation

Methods, ideas, or system contributions that make the work stand out.

phase transition
feature learning
neural networks
Hessian spectrum
multi-index models