Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper addresses three key limitations of existing online-to-nonconvex conversion frameworks for nonconvex optimization: (i) deterministic methods rely on double loops and fixed-point subroutines, incurring extra logarithmic factors; (ii) stochastic methods require strong boundedness assumptions on second-order gradient moments; and (iii) deterministic and stochastic settings are treated separately with incompatible algorithms. To overcome these, we propose a unified *double-optimistic gradient* framework. Its core innovation is a *dual-optimistic hint function*, which leverages extrapolated gradients and gradient smoothness to embed double optimism directly into linearized updates. This eliminates double loops, weakens stochastic gradient assumptions to standard bounded variance (without requiring second-moment bounds), and unifies algorithm design across deterministic and stochastic settings. Theoretically, our method achieves optimal deterministic complexity $O(varepsilon^{-1.75})$ and stochastic complexity $O(varepsilon^{-3.5})$, both free of extraneous logarithmic factors.

Technology Category

Application Category

📝 Abstract

A recent breakthrough in nonconvex optimization is the online-to-nonconvex conversion framework of cite{cutkosky2023optimal}, which reformulates the task of finding an $varepsilon$-first-order stationary point as an online learning problem. When both the gradient and the Hessian are Lipschitz continuous, instantiating this framework with two different online learners achieves a complexity of $mathcal{O}(varepsilon^{-1.75}log(1/varepsilon))$ in the deterministic case and a complexity of $mathcal{O}(varepsilon^{-3.5})$ in the stochastic case. However, this approach suffers from several limitations: (i) the deterministic method relies on a complex double-loop scheme that solves a fixed-point equation to construct hint vectors for an optimistic online learner, introducing an extra logarithmic factor; (ii) the stochastic method assumes a bounded second-order moment of the stochastic gradient, which is stronger than standard variance bounds; and (iii) different online learning algorithms are used in the two settings. In this paper, we address these issues by introducing an online optimistic gradient method based on a novel extit{doubly optimistic hint function}. Specifically, we use the gradient at an extrapolated point as the hint, motivated by two optimistic assumptions: that the difference between the hint and the target gradient remains near constant, and that consecutive update directions change slowly due to smoothness. Our method eliminates the need for a double loop and removes the logarithmic factor. Furthermore, by simply replacing full gradients with stochastic gradients and under the standard assumption that their variance is bounded by $σ^2$, we obtain a unified algorithm with complexity $mathcal{O}(varepsilon^{-1.75} + σ^2 varepsilon^{-3.5})$, smoothly interpolating between the best-known deterministic rate and the optimal stochastic rate.

Problem

Research questions and friction points this paper is trying to address.

Addressing limitations in online-to-nonconvex optimization conversion framework

Eliminating double-loop complexity and logarithmic factors in deterministic methods

Unifying deterministic and stochastic optimization under standard variance assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses doubly optimistic hint function for gradients

Eliminates double-loop scheme and logarithmic factor

Unifies deterministic and stochastic optimization complexities

🔎 Similar Papers

No similar papers found.