🤖 AI Summary
This work uncovers the spectral mechanism underlying early stopping in deep learning: gradient descent can detect the teacher signal only within a finite time window before overfitting obscures it. Focusing on a linear teacher-student model, the study characterizes fast and slow learning directions through the anisotropy of the input covariance matrix and, for the first time, incorporates the Baik–Ben Arous–Péché (BBP) transient phase transition into early stopping analysis. It identifies three distinct phases of signal behavior—“never emerging,” “persistently present,” or “transiently visible”—and constructs the complete phase diagram. Using a 2×2 Dyson equation, the authors derive the time-dependent population spectrum and apply a rank-two determinant formula to determine outlier conditions for rank-one signals. Theoretical predictions show excellent agreement with finite-size simulations, establishing a minimal yet analytically tractable early stopping mechanism governed by anisotropy and noise.
📝 Abstract
Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a $2\times 2$ Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-Péché (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.