🤖 AI Summary
Adaptive first-order optimization methods face challenges in balancing theoretical rigor with practical efficiency, particularly in non-convex settings where convergence guarantees and computational scalability remain limited.
Method: This paper proposes the Online Scaled Gradient Method (OSGM) framework, integrating dynamic step-size mechanisms from online convex optimization into gradient updates. The resulting algorithm, OSGM-Best, employs a gradient-dependent online scaling strategy for adaptive learning rate selection.
Contribution/Results: OSGM-Best achieves low memory overhead and reduced per-iteration cost while establishing, for the first time, convergence guarantees for OSGM in non-convex optimization—thereby bridging theoretical gaps between adaptive gradient methods and quasi-Newton approaches. Experiments on multiple non-convex benchmarks show that OSGM-Best matches the convergence speed of quasi-Newton methods without requiring Hessian approximations or second-order information, offering lower computational complexity and superior scalability. This work provides both a novel theoretical perspective and a practical paradigm for adaptive optimization.
📝 Abstract
Part I of this work [Gao25] establishes online scaled gradient methods (OSGM), a framework that utilizes online convex optimization to adapt stepsizes in gradient methods. This paper focuses on the practical aspects of OSGM. We leverage the OSGM framework to design new adaptive first-order methods and provide insights into their empirical behavior. The resulting method, OSGM-Best, matches the performance of quasi-Newton variants while requiring less memory and cheaper iterations. We also extend OSGM to nonconvex optimization and outline directions that connect OSGM to existing branches of optimization theory and practice.