🤖 AI Summary
This work investigates the optimization dynamics and generalization mechanisms of stochastic Gauss–Newton (SGN) methods in overparameterized deep neural networks. We propose an SGN algorithm incorporating Levenberg–Marquardt damping and mini-batch sampling, and establish a non-asymptotic, finite-time convergence bound. Crucially, we uncover— for the first time—a quantitative relationship between the smallest eigenvalue of the Gauss–Newton matrix and uniform stability. Our theoretical analysis explicitly characterizes how batch size, network width, and depth influence both convergence rate and generalization error, yielding an interpretable generalization upper bound. Furthermore, we identify curvature-dominated optimization trajectories with low Hessian perturbation as a sufficient condition for favorable generalization. Collectively, these results provide novel theoretical insights and rigorous guarantees for second-order optimization methods in deep learning.
📝 Abstract
An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.