🤖 AI Summary
This work studies the convergence of first-order algorithms for nonconvex optimization under generalized smoothness—a weaker condition than Lipschitz continuous differentiability. To address the limitation of classical smoothness assumptions, which are often too restrictive to characterize practical machine learning objectives, we propose a novel analytical framework termed *self-bounded regularity*. Within this framework, we establish, for the first time, polylogarithmic-in-dimension convergence of first-order methods—including perturbed gradient descent and stochastic gradient descent—to second-order stationary points under generalized smoothness. Our theoretical analysis yields an iteration complexity of Õ(1/ε²) for ε-accuracy, substantially improving upon prior results reliant on Lipschitz gradients. Crucially, the method requires no Hessian information or higher-order derivatives. We validate its broad applicability and practicality on canonical nonconvex problems such as deep learning and matrix factorization.
📝 Abstract
In this paper, we study the problem of non-convex optimization on functions that are not necessarily smooth using first order methods. Smoothness (functions whose gradient and/or Hessian are Lipschitz) is not satisfied by many machine learning problems in both theory and practice, motivating a recent line of work studying the convergence of first order methods to first order stationary points under appropriate generalizations of smoothness. We develop a novel framework to study convergence of first order methods to first and extit{second} order stationary points under generalized smoothness, under more general smoothness assumptions than the literature. Using our framework, we show appropriate variants of GD and SGD (e.g. with appropriate perturbations) can converge not just to first order but also extit{second order stationary points} in runtime polylogarithmic in the dimension. To our knowledge, our work contains the first such result, as well as the first 'non-textbook' rate for non-convex optimization under generalized smoothness. We demonstrate that several canonical non-convex optimization problems fall under our setting and framework.