Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies the convergence of first-order algorithms for nonconvex optimization under generalized smoothness—a weaker condition than Lipschitz continuous differentiability. To address the limitation of classical smoothness assumptions, which are often too restrictive to characterize practical machine learning objectives, we propose a novel analytical framework termed *self-bounded regularity*. Within this framework, we establish, for the first time, polylogarithmic-in-dimension convergence of first-order methods—including perturbed gradient descent and stochastic gradient descent—to second-order stationary points under generalized smoothness. Our theoretical analysis yields an iteration complexity of Õ(1/ε²) for ε-accuracy, substantially improving upon prior results reliant on Lipschitz gradients. Crucially, the method requires no Hessian information or higher-order derivatives. We validate its broad applicability and practicality on canonical nonconvex problems such as deep learning and matrix factorization.

Technology Category

Application Category

📝 Abstract
In this paper, we study the problem of non-convex optimization on functions that are not necessarily smooth using first order methods. Smoothness (functions whose gradient and/or Hessian are Lipschitz) is not satisfied by many machine learning problems in both theory and practice, motivating a recent line of work studying the convergence of first order methods to first order stationary points under appropriate generalizations of smoothness. We develop a novel framework to study convergence of first order methods to first and extit{second} order stationary points under generalized smoothness, under more general smoothness assumptions than the literature. Using our framework, we show appropriate variants of GD and SGD (e.g. with appropriate perturbations) can converge not just to first order but also extit{second order stationary points} in runtime polylogarithmic in the dimension. To our knowledge, our work contains the first such result, as well as the first 'non-textbook' rate for non-convex optimization under generalized smoothness. We demonstrate that several canonical non-convex optimization problems fall under our setting and framework.
Problem

Research questions and friction points this paper is trying to address.

Non-convex optimization under generalized smoothness conditions.
Convergence of first-order methods to second-order stationary points.
Runtime polylogarithmic in dimension for non-convex optimization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel framework for non-convex optimization convergence
Generalized smoothness assumptions beyond Lipschitz conditions
Polylogarithmic runtime for second-order stationary points
🔎 Similar Papers
No similar papers found.
D
Daniel Yiming Cao
Cornell University
A
August Y. Chen
Cornell University
Karthik Sridharan
Karthik Sridharan
Cornell University, University of Pennsylvania, Toyota Technological Institute
Learning TheoryMachine LearningOptimizationReinforcement Learning
B
Benjamin Tang
Cornell University