Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization

📅 2025-07-13

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the fundamental tension in adaptive gradient methods between achieving optimal convergence rates and adapting to local curvature. To resolve this, the authors propose a novel accelerated adaptive algorithm by incorporating Nesterov acceleration into the GRAAL adaptive framework—eliminating the need for line search or hyperparameter tuning. The method integrates momentum updates, local curvature estimation, and nonlinear extrapolation, enabling arbitrarily small initial step sizes. Under Lipschitz-smooth convex optimization, it is rigorously proven to achieve the optimal $O(1/k^2)$ convergence rate, with only a negligible logarithmic overhead in complexity. Unlike existing adaptive accelerated methods, this approach is the first to unify optimal convergence guarantees with strong adaptivity—simultaneously ensuring theoretical optimality and practical robustness across diverse problem geometries.

Technology Category

Application Category

📝 Abstract

In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function $min_x f(x)$. Recently, several adaptive gradient methods, including GRAAL (Malitsky, 2020), have been developed. These methods estimate the local curvature of the objective function to compute stepsizes, attain the standard convergence rate $mathcal{O}(1/k)$ of fixed-stepsize gradient descent for Lipschitz-smooth functions, and do not require any line search procedures or hyperparameter tuning. However, a natural question arises: is it possible to accelerate the convergence of these algorithms to match the optimal rate $mathcal{O}(1/k^2)$ of the accelerated gradient descent of Nesterov (1983)? Although some attempts have been made (Li and Lan, 2023), the capabilities of the existing accelerated algorithms to adapt to the curvature of the objective function are highly limited. Consequently, we provide a positive answer to this question and develop GRAAL with Nesterov acceleration. We prove that our algorithm achieves the desired optimal convergence rate for Lipschitz smooth functions. Moreover, in contrast to existing methods, it does so with an arbitrary, even excessively small, initial stepsize at the cost of a logarithmic additive term in the iteration complexity.

Problem

Research questions and friction points this paper is trying to address.

Accelerating adaptive gradient methods to optimal O(1/k²) rate

Achieving curvature adaptation without hyperparameter tuning

Maintaining convergence with arbitrary initial stepsize choices

Innovation

Methods, ideas, or system contributions that make the work stand out.

GRAAL with Nesterov acceleration for optimal convergence

Adaptive gradient methods estimate local curvature

No line search or hyperparameter tuning needed

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique