Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work investigates the optimization dynamics and generalization mechanisms of stochastic Gauss–Newton (SGN) methods in overparameterized deep neural networks. We propose an SGN algorithm incorporating Levenberg–Marquardt damping and mini-batch sampling, and establish a non-asymptotic, finite-time convergence bound. Crucially, we uncover— for the first time—a quantitative relationship between the smallest eigenvalue of the Gauss–Newton matrix and uniform stability. Our theoretical analysis explicitly characterizes how batch size, network width, and depth influence both convergence rate and generalization error, yielding an interpretable generalization upper bound. Furthermore, we identify curvature-dominated optimization trajectories with low Hessian perturbation as a sufficient condition for favorable generalization. Collectively, these results provide novel theoretical insights and rigorous guarantees for second-order optimization methods in deep learning.

Technology Category

Application Category

📝 Abstract

An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.

Problem

Research questions and friction points this paper is trying to address.

Analyzing stochastic Gauss-Newton method for overparameterized deep networks

Establishing finite-time convergence bounds with architecture dependencies

Deriving generalization bounds via stability in overparameterized regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Gauss-Newton method with damping and mini-batch sampling

Finite-time convergence bounds via variable-metric parameter analysis

Non-asymptotic generalization bounds using uniform stability

🔎 Similar Papers

Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning