Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work investigates how single-pass stochastic gradient descent (SGD) escapes poorly generalizing plateaus and exhibits implicit bias in over-parameterized two-layer quadratic neural networks within a teacher-student framework. By analyzing the high-dimensional limit, the dynamics of student–teacher and student–student overlap matrices are reduced to low-dimensional ordinary differential equations. Combining symmetry arguments, conserved quantities, and spectral analysis of the loss landscape’s Hessian, the study reveals that plateaus correspond to saddle points with negative eigenvalues, while zero-loss solutions form a rotationally symmetric manifold of marginally stable minima. The key findings are that over-parameterization moderately accelerates escape from plateaus solely by adjusting the prefactor of exponential decay, and that SGD, guided by conserved quantities, converges stably on the zero-loss manifold to the solution closest to initialization, thereby manifesting its implicit preference.

Technology Category

Application Category

📝 Abstract

We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. In the high-dimensional regime, where the input dimension $N$ and the number of samples $M$ diverge at fixed ratio $α= M/N$, and for finite hidden widths $(p,p^*)$ of the student and teacher, respectively, we study the low-dimensional ordinary differential equations that govern the evolution of the student--teacher and student--student overlap matrices. We show that overparameterization ($p>p^*$) only modestly accelerates escape from a plateau of poor generalization by modifying the prefactor of the exponential decay of the loss. We then examine how unconstrained weight norms introduce a continuous rotational symmetry that results in a nontrivial manifold of zero-loss solutions for $p>1$. From this manifold the dynamics consistently selects the closest solution to the random initialization, as enforced by a conserved quantity in the ODEs governing the evolution of the overlaps. Finally, a Hessian analysis of the population-loss landscape confirms that the plateau and the solution manifold correspond to saddles with at least one negative eigenvalue and to marginal minima in the population-loss geometry, respectively.

Problem

Research questions and friction points this paper is trying to address.

escape dynamics

implicit bias

overparameterization

quadratic networks

one-pass SGD

Innovation

Methods, ideas, or system contributions that make the work stand out.

overparameterization

one-pass SGD

quadratic neural networks