Gradient Methods with Online Scaling Part I. Theoretical Foundations

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This paper addresses the challenge of adaptive step-size selection in first-order optimization by proposing the Online Scaled Gradient Method (OSGM), which learns dynamic step sizes online via a convergence-driven feedback function, yielding theoretically guaranteed acceleration. Its main contributions are threefold: (1) It introduces the first first-order algorithm with non-asymptotic superlinear convergence; (2) It unifies and theoretically justifies heuristic approaches such as hypergradient descent; (3) For smooth convex problems, it ensures trajectory-dependent global convergence; for smooth strongly convex problems, it achieves improved iteration complexity over classical methods and attains local superlinear convergence—its asymptotic performance matching or exceeding that of the optimal fixed step size.

Technology Category

Application Category

📝 Abstract

This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of a stepsize by a feedback function motivated from a convergence measure and uses the feedback to adjust the stepsize through an online learning algorithm. Consequently, instantiations of OSGM achieve convergence rates that are asymptotically no worse than the optimal stepsize. OSGM yields desirable convergence guarantees on smooth convex problems, including 1) trajectory-dependent global convergence on smooth convex objectives; 2) an improved complexity result on smooth strongly convex problems, and 3) local superlinear convergence. Notably, OSGM constitutes a new family of first-order methods with non-asymptotic superlinear convergence, joining the celebrated quasi-Newton methods. Finally, OSGM explains the empirical success of the popular hypergradient-descent heuristic in optimization for machine learning.

Problem

Research questions and friction points this paper is trying to address.

Establishes theoretical foundations for online scaled gradient methods (OSGM)

Achieves asymptotically optimal convergence rates via adaptive stepsizes

Provides non-asymptotic superlinear convergence for first-order methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning adapts stepsizes for acceleration

Feedback function quantifies stepsize effectiveness

Achieves non-asymptotic superlinear convergence rates

🔎 Similar Papers

Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling