🤖 AI Summary
This work establishes non-asymptotic convergence guarantees for Stochastic Gradient Langevin Dynamics (SGLD) under lazy training. Addressing the challenges of kernel degeneracy and intractable convergence rates in deep neural network training, we propose a continuous-time modeling framework based on Itô-type stochastic differential equations (SDEs). Under a Hessian regularity condition on the loss function, we prove: (i) SGLD maintains a non-degenerate Neural Tangent Kernel (NTK) with high probability throughout training; (ii) it achieves exponential convergence—in expectation—to the empirical risk minimizer; and (iii) we derive explicit finite-time and finite-width upper bounds on the optimization gap. These theoretical findings are validated via numerical experiments on regression tasks. To our knowledge, this is the first rigorous analysis of SGLD in deep learning that simultaneously ensures NTK non-degeneracy and provides quantitative, non-asymptotic convergence guarantees.
📝 Abstract
Continuous-time models provide important insights into the training dynamics of optimization algorithms in deep learning. In this work, we establish a non-asymptotic convergence analysis of stochastic gradient Langevin dynamics (SGLD), which is an Itô stochastic differential equation (SDE) approximation of stochastic gradient descent in continuous time, in the lazy training regime. We show that, under regularity conditions on the Hessian of the loss function, SGLD with multiplicative and state-dependent noise (i) yields a non-degenerate kernel throughout the training process with high probability, and (ii) achieves exponential convergence to the empirical risk minimizer in expectation, and we establish finite-time and finite-width bounds on the optimality gap. We corroborate our theoretical findings with numerical examples in the regression setting.