🤖 AI Summary
This paper addresses three key limitations of existing online-to-nonconvex conversion frameworks for nonconvex optimization: (i) deterministic methods rely on double loops and fixed-point subroutines, incurring extra logarithmic factors; (ii) stochastic methods require strong boundedness assumptions on second-order gradient moments; and (iii) deterministic and stochastic settings are treated separately with incompatible algorithms. To overcome these, we propose a unified *double-optimistic gradient* framework. Its core innovation is a *dual-optimistic hint function*, which leverages extrapolated gradients and gradient smoothness to embed double optimism directly into linearized updates. This eliminates double loops, weakens stochastic gradient assumptions to standard bounded variance (without requiring second-moment bounds), and unifies algorithm design across deterministic and stochastic settings. Theoretically, our method achieves optimal deterministic complexity $O(varepsilon^{-1.75})$ and stochastic complexity $O(varepsilon^{-3.5})$, both free of extraneous logarithmic factors.
📝 Abstract
A recent breakthrough in nonconvex optimization is the online-to-nonconvex conversion framework of cite{cutkosky2023optimal}, which reformulates the task of finding an $varepsilon$-first-order stationary point as an online learning problem. When both the gradient and the Hessian are Lipschitz continuous, instantiating this framework with two different online learners achieves a complexity of $mathcal{O}(varepsilon^{-1.75}log(1/varepsilon))$ in the deterministic case and a complexity of $mathcal{O}(varepsilon^{-3.5})$ in the stochastic case. However, this approach suffers from several limitations: (i) the deterministic method relies on a complex double-loop scheme that solves a fixed-point equation to construct hint vectors for an optimistic online learner, introducing an extra logarithmic factor; (ii) the stochastic method assumes a bounded second-order moment of the stochastic gradient, which is stronger than standard variance bounds; and (iii) different online learning algorithms are used in the two settings. In this paper, we address these issues by introducing an online optimistic gradient method based on a novel extit{doubly optimistic hint function}. Specifically, we use the gradient at an extrapolated point as the hint, motivated by two optimistic assumptions: that the difference between the hint and the target gradient remains near constant, and that consecutive update directions change slowly due to smoothness. Our method eliminates the need for a double loop and removes the logarithmic factor. Furthermore, by simply replacing full gradients with stochastic gradients and under the standard assumption that their variance is bounded by $σ^2$, we obtain a unified algorithm with complexity $mathcal{O}(varepsilon^{-1.75} + σ^2 varepsilon^{-3.5})$, smoothly interpolating between the best-known deterministic rate and the optimal stochastic rate.