Convergence of continuous-time stochastic gradient descent with applications to linear deep neural networks

📅 2024-09-11
🏛️ arXiv.org
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
This work investigates the convergence of continuous-time stochastic gradient descent (SGD) for minimizing population expected loss, extending analysis to overparameterized linear deep neural networks. Methodologically, it establishes a continuous-time modeling framework based on stochastic differential equations and integrates Lyapunov stability theory with nonconvex optimization principles. The key contribution is the first general convergence criterion applicable to stochastic dynamical systems—overcoming the limitation of Chatterjee (2022), which applies only to deterministic gradient descent. Theoretically, under mild regularity conditions, SGD trajectories are proven to converge almost surely to the global optimum. For linear deep networks, the paper derives verifiable sufficient conditions for convergence and elucidates the intrinsic mechanism by which noise perturbations preserve stability—thereby providing novel theoretical foundations for optimization in overparameterized models.

Technology Category

Application Category

📝 Abstract
We study a continuous-time approximation of the stochastic gradient descent process for minimizing the expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized linear neural network training.
Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence conditions for continuous-time stochastic gradient descent
Extending convergence theory from deterministic to stochastic optimization
Applying convergence results to overparametrized neural network training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous-time stochastic gradient descent approximation
General sufficient convergence conditions established
Applied to overparametrized neural network training
🔎 Similar Papers
No similar papers found.
Gabor Lugosi
Gabor Lugosi
ICREA and Universitat Pompeu Fabra
machine learninglearning theoryprobabilitystatisticsrandom graphs
E
Eulalia Nualart
Universitat Pompeu Fabra and Barcelona School of Economics, Department of Economics and Business, Ramón Trias Fargas 25-27, 08005, Barcelona, Spain