Effective continuous equations for adaptive SGD: a stochastic analysis view

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The theoretical analysis of adaptive stochastic gradient descent (SGD) under small learning rates remains challenging due to the intricate coupling between parameter updates and adaptive second-moment estimates. Method: We establish a rigorous continuous-time limit dynamics model via stochastic differential equations (SDEs), employing diffusion approximation, multiscale analysis, and an extended framework inspired by Malliavin calculus techniques. Contribution/Results: We prove that, as the learning rate vanishes, the sampling noise in the joint evolution of parameters and second-moment estimates asymptotically behaves as the superposition of two independent Brownian motions. We derive precise scaling laws linking the learning rate to key hyperparameters—yielding a unified SDE limit that encompasses mainstream adaptive optimizers including Adam and RMSProp. This work provides the first theoretical characterization of the intrinsic noise structure and coupled bivariate dynamics of adaptive optimizers, delivering a tight, analytically tractable continuous approximation to their discrete-time behavior.

Technology Category

Application Category

📝 Abstract
We present a theoretical analysis of some popular adaptive Stochastic Gradient Descent (SGD) methods in the small learning rate regime. Using the stochastic modified equations framework introduced by Li et al., we derive effective continuous stochastic dynamics for these methods. Our key contribution is that sampling-induced noise in SGD manifests in the limit as independent Brownian motions driving the parameter and gradient second momentum evolutions. Furthermore, extending the approach of Malladi et al., we investigate scaling rules between the learning rate and key hyperparameters in adaptive methods, characterising all non-trivial limiting dynamics.
Problem

Research questions and friction points this paper is trying to address.

Analyzing adaptive SGD methods in small learning rate regimes
Deriving continuous stochastic dynamics for adaptive optimization algorithms
Investigating scaling relationships between learning rates and hyperparameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives continuous stochastic dynamics for adaptive SGD
Models sampling noise as independent Brownian motions
Investigates learning rate scaling with hyperparameters
L
Luca Callisti
Dipartimento di Matematica, Università di Pisa, Largo Bruno Pontecorvo 5, I–56127 Pisa, Italia
Marco Romito
Marco Romito
Università di Pisa
mathematics
F
Francesco Triggiano
Scuola Normale Superiore, Piazza dei Cavalieri, 7, 56126 Pisa, Italia