Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three key limitations of existing online-to-nonconvex conversion frameworks for nonconvex optimization: (i) deterministic methods rely on double loops and fixed-point subroutines, incurring extra logarithmic factors; (ii) stochastic methods require strong boundedness assumptions on second-order gradient moments; and (iii) deterministic and stochastic settings are treated separately with incompatible algorithms. To overcome these, we propose a unified *double-optimistic gradient* framework. Its core innovation is a *dual-optimistic hint function*, which leverages extrapolated gradients and gradient smoothness to embed double optimism directly into linearized updates. This eliminates double loops, weakens stochastic gradient assumptions to standard bounded variance (without requiring second-moment bounds), and unifies algorithm design across deterministic and stochastic settings. Theoretically, our method achieves optimal deterministic complexity $O(varepsilon^{-1.75})$ and stochastic complexity $O(varepsilon^{-3.5})$, both free of extraneous logarithmic factors.

Technology Category

Application Category

📝 Abstract
A recent breakthrough in nonconvex optimization is the online-to-nonconvex conversion framework of cite{cutkosky2023optimal}, which reformulates the task of finding an $varepsilon$-first-order stationary point as an online learning problem. When both the gradient and the Hessian are Lipschitz continuous, instantiating this framework with two different online learners achieves a complexity of $mathcal{O}(varepsilon^{-1.75}log(1/varepsilon))$ in the deterministic case and a complexity of $mathcal{O}(varepsilon^{-3.5})$ in the stochastic case. However, this approach suffers from several limitations: (i) the deterministic method relies on a complex double-loop scheme that solves a fixed-point equation to construct hint vectors for an optimistic online learner, introducing an extra logarithmic factor; (ii) the stochastic method assumes a bounded second-order moment of the stochastic gradient, which is stronger than standard variance bounds; and (iii) different online learning algorithms are used in the two settings. In this paper, we address these issues by introducing an online optimistic gradient method based on a novel extit{doubly optimistic hint function}. Specifically, we use the gradient at an extrapolated point as the hint, motivated by two optimistic assumptions: that the difference between the hint and the target gradient remains near constant, and that consecutive update directions change slowly due to smoothness. Our method eliminates the need for a double loop and removes the logarithmic factor. Furthermore, by simply replacing full gradients with stochastic gradients and under the standard assumption that their variance is bounded by $σ^2$, we obtain a unified algorithm with complexity $mathcal{O}(varepsilon^{-1.75} + σ^2 varepsilon^{-3.5})$, smoothly interpolating between the best-known deterministic rate and the optimal stochastic rate.
Problem

Research questions and friction points this paper is trying to address.

Addressing limitations in online-to-nonconvex optimization conversion framework
Eliminating double-loop complexity and logarithmic factors in deterministic methods
Unifying deterministic and stochastic optimization under standard variance assumptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses doubly optimistic hint function for gradients
Eliminates double-loop scheme and logarithmic factor
Unifies deterministic and stochastic optimization complexities
🔎 Similar Papers
No similar papers found.
F
Francisco Patitucci
Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA.
Ruichen Jiang
Ruichen Jiang
University of Texas at Austin
Optimization
Aryan Mokhtari
Aryan Mokhtari
UT Austin
OptimizationMachine Learning