Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work establishes non-asymptotic convergence guarantees for Stochastic Gradient Langevin Dynamics (SGLD) under lazy training. Addressing the challenges of kernel degeneracy and intractable convergence rates in deep neural network training, we propose a continuous-time modeling framework based on Itô-type stochastic differential equations (SDEs). Under a Hessian regularity condition on the loss function, we prove: (i) SGLD maintains a non-degenerate Neural Tangent Kernel (NTK) with high probability throughout training; (ii) it achieves exponential convergence—in expectation—to the empirical risk minimizer; and (iii) we derive explicit finite-time and finite-width upper bounds on the optimization gap. These theoretical findings are validated via numerical experiments on regression tasks. To our knowledge, this is the first rigorous analysis of SGLD in deep learning that simultaneously ensures NTK non-degeneracy and provides quantitative, non-asymptotic convergence guarantees.

Technology Category

Application Category

📝 Abstract

Continuous-time models provide important insights into the training dynamics of optimization algorithms in deep learning. In this work, we establish a non-asymptotic convergence analysis of stochastic gradient Langevin dynamics (SGLD), which is an Itô stochastic differential equation (SDE) approximation of stochastic gradient descent in continuous time, in the lazy training regime. We show that, under regularity conditions on the Hessian of the loss function, SGLD with multiplicative and state-dependent noise (i) yields a non-degenerate kernel throughout the training process with high probability, and (ii) achieves exponential convergence to the empirical risk minimizer in expectation, and we establish finite-time and finite-width bounds on the optimality gap. We corroborate our theoretical findings with numerical examples in the regression setting.

Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence of stochastic gradient Langevin dynamics algorithm

Establishing non-asymptotic convergence bounds for SGLD training

Studying exponential convergence to empirical risk minimizer

Innovation

Methods, ideas, or system contributions that make the work stand out.

SGLD uses multiplicative state-dependent noise

Achieves exponential convergence to empirical risk

Maintains non-degenerate kernel during training process

🔎 Similar Papers

No similar papers found.