Accelerating Proximal Gradient Descent via Silver Stepsizes

📅 2024-12-07
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Accelerating projected/proximal gradient methods for constrained and composite convex optimization—particularly whether convergence can be improved via step-size scheduling alone, without momentum. Method: We propose a novel step-size sequence based on the silver ratio ρ = 1 + √2 and introduce a Laplacian-structured sum-of-squares (SOS) certificate to characterize proximal updates, combined with a recursive concatenation analysis technique. Results: We provide the first rigorous proof that this step-size schedule achieves asymptotically optimal convergence rates: O(ε⁻ˡᵒᵍᵣ²) for smooth convex objectives and O(κˡᵒᵍᵣ² log(1/ε)) for μ-strongly convex objectives with condition number κ, matching the silver-rate lower bound and substantially outperforming fixed or standard decaying step sizes. Our analysis reveals that judicious step-size design—devoid of momentum—is inherently capable of universal acceleration in proximal and projected gradient methods.

Technology Category

Application Category

📝 Abstract
Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum -- just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2023), namely $O(varepsilon^{- log_{ ho} 2})$ for smooth convex optimization and $O(kappa^{log_ ho 2} log frac{1}{varepsilon})$ under strong convexity, where $varepsilon$ is the precision, $kappa$ is the condition number, and $ ho = 1 + sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing -- the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes -- with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.
Problem

Research questions and friction points this paper is trying to address.

Accelerating gradient descent without momentum via stepsizes
Extending stepsize-based acceleration to constrained optimization
Proving optimality of silver stepsize schedule convergence rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Silver stepsize schedule accelerates proximal gradient descent
Recursive gluing technique enables time-varying stepsize analysis
Laplacian-structured sum-of-squares certifies proximal updates
🔎 Similar Papers
No similar papers found.
J
Jinho Bok
Department of Statistics and Data Science, University of Pennsylvania
Jason M. Altschuler
Jason M. Altschuler
UPenn
OptimizationMachine LearningMathematics of Data ScienceOptimal Transport