Almost sure convergence rates of stochastic gradient methods under gradient domination

📅 2024-05-22
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
This work investigates the **almost-sure convergence rate** of stochastic gradient descent (SGD) and its momentum variants at the **last iterate**, under global or local β-gradient dominance. While prior analyses predominantly establish convergence in expectation, this paper provides the first almost-sure convergence rate guarantees for SGD (with and without momentum) under this condition. It shows that the function value error (f(X_n) - f^*) decays almost surely as (o(n^{-1/(4eta-1)+varepsilon})) for any (varepsilon > 0), matching the known optimal rate in expectation up to an arbitrarily small exponent. Technically, the analysis integrates gradient-dominance arguments, almost-sure convergence theory, and functional inequalities. The theoretical findings are empirically validated on supervised learning and reinforcement learning training tasks, offering stronger probabilistic convergence assurances for stochastic optimization algorithms.

Technology Category

Application Category

📝 Abstract
Stochastic gradient methods are among the most important algorithms in training machine learning problems. While classical assumptions such as strong convexity allow a simple analysis they are rarely satisfied in applications. In recent years, global and local gradient domination properties have shown to be a more realistic replacement of strong convexity. They were proved to hold in diverse settings such as (simple) policy gradient methods in reinforcement learning and training of deep neural networks with analytic activation functions. We prove almost sure convergence rates $f(X_n)-f^*in oig( n^{-frac{1}{4eta-1}+epsilon}ig)$ of the last iterate for stochastic gradient descent (with and without momentum) under global and local $eta$-gradient domination assumptions. The almost sure rates get arbitrarily close to recent rates in expectation. Finally, we demonstrate how to apply our results to the training task in both supervised and reinforcement learning.
Problem

Research questions and friction points this paper is trying to address.

Analyzes convergence rates of stochastic gradient methods
Focuses on gradient domination instead of strong convexity
Applies findings to supervised and reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves convergence rates under gradient domination
Applies to stochastic gradient descent with momentum
Validates in supervised and reinforcement learning