Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the optimization dynamics and generalization mechanisms of stochastic Gauss–Newton (SGN) methods in overparameterized deep neural networks. We propose an SGN algorithm incorporating Levenberg–Marquardt damping and mini-batch sampling, and establish a non-asymptotic, finite-time convergence bound. Crucially, we uncover— for the first time—a quantitative relationship between the smallest eigenvalue of the Gauss–Newton matrix and uniform stability. Our theoretical analysis explicitly characterizes how batch size, network width, and depth influence both convergence rate and generalization error, yielding an interpretable generalization upper bound. Furthermore, we identify curvature-dominated optimization trajectories with low Hessian perturbation as a sufficient condition for favorable generalization. Collectively, these results provide novel theoretical insights and rigorous guarantees for second-order optimization methods in deep learning.

Technology Category

Application Category

📝 Abstract
An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.
Problem

Research questions and friction points this paper is trying to address.

Analyzing stochastic Gauss-Newton method for overparameterized deep networks
Establishing finite-time convergence bounds with architecture dependencies
Deriving generalization bounds via stability in overparameterized regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Gauss-Newton method with damping and mini-batch sampling
Finite-time convergence bounds via variable-metric parameter analysis
Non-asymptotic generalization bounds using uniform stability
🔎 Similar Papers
No similar papers found.