🤖 AI Summary
This work investigates the convergence behavior and rate of stochastic gradient descent with momentum (SGDM) under nonconvex and nonsmooth conditions. Addressing the limitation of conventional per-iteration analysis—which fails to capture the synergistic interplay between momentum and stochastic noise—we propose a novel time-window-based analytical framework. Under the Łojasiewicz inequality assumption, we establish, for the first time, the almost-sure convergence of the SGDM iterate sequence and derive an explicit local convergence rate that depends on the Łojasiewicz exponent. Our approach integrates stochastic optimization theory, joint bounding techniques for momentum and stochastic error terms, and time-window sequence control, thereby circumventing standard assumptions of smoothness or strong convexity. The results provide rigorous theoretical foundations for SGDM-type optimizers widely used in deep learning.
📝 Abstract
The stochastic gradient descent method with momentum (SGDM) is a common approach for solving large-scale and stochastic optimization problems. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in simultaneously controlling the momentum and stochastic errors in an almost sure sense. To address these challenges, we investigate the behavior of SGDM over specific time windows, rather than examining the descent of consecutive iterates as in traditional studies. This time window-based approach simplifies the convergence analysis and enables us to establish the iterate convergence result for SGDM under the {L}ojasiewicz property. We further provide local convergence rates which depend on the underlying {L}ojasiewicz exponent and the utilized step size schemes.