Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses an open question in convex optimization under generalized smoothness—specifically, $(L_0, L_1)$-smoothness—regarding whether accelerated gradient methods can achieve the optimal complexity $O(sqrt{ell(0)} R / sqrt{varepsilon})$ for small error tolerances $varepsilon$. We propose a novel first-order accelerated algorithm and develop a Lyapunov function framework tailored to generalized smoothness. For the first time, we rigorously attain this optimal complexity bound under $(L_0, L_1)$-smoothness, eliminating exponential factors and extraneous dependencies present in prior approaches. Our analysis yields tight convergence rates and establishes a concise, scalable paradigm for designing and analyzing accelerated methods under broad smoothness assumptions. The result is theoretically optimal and significantly advances the understanding of acceleration beyond standard $L$-smoothness.

Technology Category

Application Category

📝 Abstract

We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $ell$-smoothness condition $|| abla^{2}f(x)|| le ellleft(|| abla f(x)|| ight),$ which generalizes the $L$-smoothness and $(L_{0},L_{1})$-smoothness. While accelerated gradient descent AGD is known to reach the optimal complexity $O(sqrt{L} R / sqrt{varepsilon})$ under $L$-smoothness, where $varepsilon$ is an error tolerance and $R$ is the distance between a starting and an optimal point, existing extensions to $ell$-smoothness either incur extra dependence on the initial gradient, suffer exponential factors in $L_{1} R$, or require costly auxiliary sub-routines, leaving open whether an AGD-type $O(sqrt{ell(0)} R / sqrt{varepsilon})$ rate is possible for small-$varepsilon$, even in the $(L_{0},L_{1})$-smoothness case. We resolve this open question. Leveraging a new Lyapunov function and designing new algorithms, we achieve $O(sqrt{ell(0)} R / sqrt{varepsilon})$ oracle complexity for small-$varepsilon$ and virtually any $ell$. For instance, for $(L_{0},L_{1})$-smoothness, our bound $O(sqrt{L_0} R / sqrt{varepsilon})$ is provably optimal in the small-$varepsilon$ regime and removes all non-constant multiplicative factors present in prior accelerated algorithms.

Problem

Research questions and friction points this paper is trying to address.

Extending accelerated gradient methods to generalized smoothness conditions

Achieving optimal convergence rates without exponential factors

Resolving open questions on complexity under (L0, L1)-smoothness

Innovation

Methods, ideas, or system contributions that make the work stand out.

New Lyapunov function for accelerated gradient methods

Optimal convergence under generalized smoothness conditions

Removes exponential factors in prior algorithms

🔎 Similar Papers

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity