AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the manual tuning of learning rates in stochastic gradient descent (SGD), this paper proposes a preset-schedule-free, online adaptive learning rate mechanism. At each iteration, it makes a binary decision—increment or decrement the learning rate—based on the gradient’s temporal variation and local geometric properties of the objective function, with provable convergence guarantees. A deterministic counterpart for full-batch gradient descent is also derived. Unlike conventional approaches, the method requires no hyperparameter pre-specification or sliding-window statistics, achieving significantly improved convergence speed and robustness across diverse optimization problems and machine learning tasks—outperforming fixed-rate, decaying, and mainstream adaptive methods (e.g., Adam, AdaGrad). The core innovation lies in formulating learning rate adaptation as a theoretically grounded online binary decision process—the first such formulation with rigorous convergence analysis.

Technology Category

Application Category

📝 Abstract

The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a non-trivial amount of user tuning effort. To address this, we introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learning rate at a given iteration and then takes appropriate action. We introduce theory supporting the convergence of AutoSGD, along with its deterministic counterpart for standard gradient descent. Empirical results suggest strong performance of the method on a variety of traditional optimization problems and machine learning tasks.

Problem

Research questions and friction points this paper is trying to address.

Automates learning rate adjustment in SGD

Reduces manual tuning effort for SGD

Ensures convergence with adaptive learning rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically adjusts SGD learning rate

Determines rate changes per iteration

Ensures convergence with theoretical support

🔎 Similar Papers

Gradient descent with generalized Newton's method