🤖 AI Summary
First-order optimization methods such as gradient descent heavily rely on manual learning rate tuning—especially impractical in nested optimization settings. This paper proposes AutoGD, a fully automated, prior-free gradient descent algorithm with adaptive step sizes. Its core mechanism dynamically adjusts the step length based on iterative gradient variation, and is theoretically proven to recover the optimal convergence rates of standard gradient descent for both smooth convex and nonconvex functions. We further extend this adaptive principle to the quasi-Newton framework, yielding AutoBFGS and AutoL-BFGS. Empirical evaluations demonstrate that AutoGD and its variants consistently outperform fixed-step methods and mainstream adaptive optimizers—including Adam and AdaGrad—across classical optimization benchmarks and variational inference tasks. The proposed methods combine rigorous theoretical guarantees with broad practical applicability.
📝 Abstract
The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate learning rate schedule. When such methods appear as inner loops of other algorithms, expecting the user to tune the learning rates may be impractical. To address this, we introduce AutoGD: a gradient descent method that automatically determines whether to increase or decrease the learning rate at a given iteration. We establish the convergence of AutoGD, and show that we can recover the optimal rate of GD (up to a constant) for a broad class of functions without knowledge of smoothness constants. Experiments on a variety of traditional problems and variational inference optimization tasks demonstrate strong performance of the method, along with its extensions to AutoBFGS and AutoLBFGS.