Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses theoretical limitations in the asymptotic convergence analysis of stochastic gradient descent (SGD) for nonconvex stochastic optimization. Classical analyses rely on the Robbins–Monro step-size condition (e.g., ∑εₜ² < ∞) and strong regularity assumptions—namely, globally Lipschitz continuous gradients and bounded higher-order moments. We propose a novel analytical framework grounded in stopping-time arguments and martingale convergence theory. For the first time, under the significantly relaxed step-size conditions ∑εₜ = ∞ and ∑εₜᵖ < ∞ for some p > 2, we rigorously establish almost-sure convergence of the SGD iterates to critical points and derive an associated L₂ convergence rate. Crucially, our analysis dispenses with the global Lipschitz gradient assumption, substantially broadening the applicability of SGD convergence theory. Moreover, it accommodates practical step-size schedules—including constant and polynomially decaying steps—enhancing alignment with empirical training practices.

Technology Category

Application Category

📝 Abstract

Stochastic Gradient Descent (SGD) is widely used in machine learning research. Previous convergence analyses of SGD under the vanishing step-size setting typically require Robbins-Monro conditions. However, in practice, a wider variety of step-size schemes are frequently employed, yet existing convergence results remain limited and often rely on strong assumptions. This paper bridges this gap by introducing a novel analytical framework based on a stopping-time method, enabling asymptotic convergence analysis of SGD under more relaxed step-size conditions and weaker assumptions. In the non-convex setting, we prove the almost sure convergence of SGD iterates for step-sizes $ { epsilon_t }_{t geq 1} $ satisfying $sum_{t=1}^{+infty} epsilon_t = +infty$ and $sum_{t=1}^{+infty} epsilon_t^p<+infty$ for some $p>2$. Compared with previous studies, our analysis eliminates the global Lipschitz continuity assumption on the loss function and relaxes the boundedness requirements for higher-order moments of stochastic gradients. Building upon the almost sure convergence results, we further establish $L_2$ convergence. These significantly relaxed assumptions make our theoretical results more general, thereby enhancing their applicability in practical scenarios.

Problem

Research questions and friction points this paper is trying to address.

Analyzing SGD convergence under relaxed step-size conditions

Eliminating global Lipschitz continuity assumption for loss functions

Relaxing boundedness requirements for stochastic gradients' higher-order moments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel stopping-time method for SGD analysis

Relaxed step-size conditions without Lipschitz continuity

Weaker assumptions on stochastic gradients moments

🔎 Similar Papers

Convergence of SGD with momentum in the nonconvex case: A novel time window-based analysis