Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses theoretical limitations in the asymptotic convergence analysis of stochastic gradient descent (SGD) for nonconvex stochastic optimization. Classical analyses rely on the Robbins–Monro step-size condition (e.g., ∑εₜ² < ∞) and strong regularity assumptions—namely, globally Lipschitz continuous gradients and bounded higher-order moments. We propose a novel analytical framework grounded in stopping-time arguments and martingale convergence theory. For the first time, under the significantly relaxed step-size conditions ∑εₜ = ∞ and ∑εₜᵖ < ∞ for some p > 2, we rigorously establish almost-sure convergence of the SGD iterates to critical points and derive an associated L₂ convergence rate. Crucially, our analysis dispenses with the global Lipschitz gradient assumption, substantially broadening the applicability of SGD convergence theory. Moreover, it accommodates practical step-size schedules—including constant and polynomially decaying steps—enhancing alignment with empirical training practices.

Technology Category

Application Category

📝 Abstract
Stochastic Gradient Descent (SGD) is widely used in machine learning research. Previous convergence analyses of SGD under the vanishing step-size setting typically require Robbins-Monro conditions. However, in practice, a wider variety of step-size schemes are frequently employed, yet existing convergence results remain limited and often rely on strong assumptions. This paper bridges this gap by introducing a novel analytical framework based on a stopping-time method, enabling asymptotic convergence analysis of SGD under more relaxed step-size conditions and weaker assumptions. In the non-convex setting, we prove the almost sure convergence of SGD iterates for step-sizes $ { epsilon_t }_{t geq 1} $ satisfying $sum_{t=1}^{+infty} epsilon_t = +infty$ and $sum_{t=1}^{+infty} epsilon_t^p<+infty$ for some $p>2$. Compared with previous studies, our analysis eliminates the global Lipschitz continuity assumption on the loss function and relaxes the boundedness requirements for higher-order moments of stochastic gradients. Building upon the almost sure convergence results, we further establish $L_2$ convergence. These significantly relaxed assumptions make our theoretical results more general, thereby enhancing their applicability in practical scenarios.
Problem

Research questions and friction points this paper is trying to address.

Analyzing SGD convergence under relaxed step-size conditions
Eliminating global Lipschitz continuity assumption for loss functions
Relaxing boundedness requirements for stochastic gradients' higher-order moments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel stopping-time method for SGD analysis
Relaxed step-size conditions without Lipschitz continuity
Weaker assumptions on stochastic gradients moments
🔎 Similar Papers
No similar papers found.
R
Ruinan Jin
Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, New Territories, Hong Kong
Difei Cheng
Difei Cheng
Institude of automation, Chinese academy of sciences
clusteringstochastic optimazition
Hong Qiao
Hong Qiao
Institute of Automation, Chinese Academy of Sciences
X
Xin Shi
Shanghai Jiao Tong University, Shanghai, 200240, China
S
Shaodong Liu
Shanghai Jiao Tong University, Shanghai, 200240, China
B
Bo Zhang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China