Convergence of SGD with momentum in the nonconvex case: A novel time window-based analysis

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the convergence behavior and rate of stochastic gradient descent with momentum (SGDM) under nonconvex and nonsmooth conditions. Addressing the limitation of conventional per-iteration analysis—which fails to capture the synergistic interplay between momentum and stochastic noise—we propose a novel time-window-based analytical framework. Under the Łojasiewicz inequality assumption, we establish, for the first time, the almost-sure convergence of the SGDM iterate sequence and derive an explicit local convergence rate that depends on the Łojasiewicz exponent. Our approach integrates stochastic optimization theory, joint bounding techniques for momentum and stochastic error terms, and time-window sequence control, thereby circumventing standard assumptions of smoothness or strong convexity. The results provide rigorous theoretical foundations for SGDM-type optimizers widely used in deep learning.

Technology Category

Application Category

📝 Abstract
The stochastic gradient descent method with momentum (SGDM) is a common approach for solving large-scale and stochastic optimization problems. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in simultaneously controlling the momentum and stochastic errors in an almost sure sense. To address these challenges, we investigate the behavior of SGDM over specific time windows, rather than examining the descent of consecutive iterates as in traditional studies. This time window-based approach simplifies the convergence analysis and enables us to establish the iterate convergence result for SGDM under the {L}ojasiewicz property. We further provide local convergence rates which depend on the underlying {L}ojasiewicz exponent and the utilized step size schemes.
Problem

Research questions and friction points this paper is trying to address.

SGDM Convergence
Irregular Shape Problems
Momentum Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-Slicing Approach
Convergence Analysis
SGDM Optimization
🔎 Similar Papers
No similar papers found.
J
Junwen Qiu
Industrial Systems Engineering & Management, National University of Singapore
B
Bohao Ma
School of Data Science (SDS), The Chinese University of Hong Kong, Shenzhen
Andre Milzarek
Andre Milzarek
Assistant Professor, The Chinese University of Hong Kong, Shenzhen
nonsmooth optimizationstochastic optimizationsecond order methodssecond order theory