🤖 AI Summary
Theoretical convergence guarantees for nonconvex optimization are often significantly weaker than observed empirical performance, exposing a substantial theory-practice gap. Method: We propose the first tunable, unified assumption framework for nonconvex functions—encompassing prominent structural conditions such as the Polyak–Łojasiewicz (PL) inequality and weak convexity—designed to balance generality with analytical tractability. Based on this framework, we derive unified convergence theorems for both deterministic and stochastic gradient methods, rigorously characterizing how algorithmic convergence rates depend on the tunable parameters. Contribution/Results: Our analysis identifies parameter regimes under which efficient optimization is theoretically feasible. Empirical trajectory validation on canonical nonconvex tasks—including neural network training—confirms that the assumed condition holds along actual optimization paths. This provides a new, verifiable theoretical foundation for understanding the practical efficacy of nonconvex optimization algorithms.
📝 Abstract
Nonconvex optimization is central to modern machine learning, but the general framework of nonconvex optimization yields weak convergence guarantees that are too pessimistic compared to practice. On the other hand, while convexity enables efficient optimization, it is of limited applicability to many practical problems. To bridge this gap and better understand the practical success of optimization algorithms in nonconvex settings, we introduce a novel unified parametric assumption. Our assumption is general enough to encompass a broad class of nonconvex functions while also being specific enough to enable the derivation of a unified convergence theorem for gradient-based methods. Notably, by tuning the parameters of our assumption, we demonstrate its versatility in recovering several existing function classes as special cases and in identifying functions amenable to efficient optimization. We derive our convergence theorem for both deterministic and stochastic optimization, and conduct experiments to verify that our assumption can hold practically over optimization trajectories.