Gradient Methods with Online Scaling Part II. Practical Aspects

📅 2025-09-13

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Adaptive first-order optimization methods face challenges in balancing theoretical rigor with practical efficiency, particularly in non-convex settings where convergence guarantees and computational scalability remain limited. Method: This paper proposes the Online Scaled Gradient Method (OSGM) framework, integrating dynamic step-size mechanisms from online convex optimization into gradient updates. The resulting algorithm, OSGM-Best, employs a gradient-dependent online scaling strategy for adaptive learning rate selection. Contribution/Results: OSGM-Best achieves low memory overhead and reduced per-iteration cost while establishing, for the first time, convergence guarantees for OSGM in non-convex optimization—thereby bridging theoretical gaps between adaptive gradient methods and quasi-Newton approaches. Experiments on multiple non-convex benchmarks show that OSGM-Best matches the convergence speed of quasi-Newton methods without requiring Hessian approximations or second-order information, offering lower computational complexity and superior scalability. This work provides both a novel theoretical perspective and a practical paradigm for adaptive optimization.

Technology Category

Application Category

📝 Abstract

Part I of this work [Gao25] establishes online scaled gradient methods (OSGM), a framework that utilizes online convex optimization to adapt stepsizes in gradient methods. This paper focuses on the practical aspects of OSGM. We leverage the OSGM framework to design new adaptive first-order methods and provide insights into their empirical behavior. The resulting method, OSGM-Best, matches the performance of quasi-Newton variants while requiring less memory and cheaper iterations. We also extend OSGM to nonconvex optimization and outline directions that connect OSGM to existing branches of optimization theory and practice.

Problem

Research questions and friction points this paper is trying to address.

Designing practical adaptive gradient methods with online scaling

Extending OSGM framework to nonconvex optimization problems

Developing memory-efficient methods matching quasi-Newton performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online scaled gradient methods framework

OSGM-Best matches quasi-Newton performance

Extends OSGM to nonconvex optimization

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation