AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the manual tuning of learning rates in stochastic gradient descent (SGD), this paper proposes a preset-schedule-free, online adaptive learning rate mechanism. At each iteration, it makes a binary decision—increment or decrement the learning rate—based on the gradient’s temporal variation and local geometric properties of the objective function, with provable convergence guarantees. A deterministic counterpart for full-batch gradient descent is also derived. Unlike conventional approaches, the method requires no hyperparameter pre-specification or sliding-window statistics, achieving significantly improved convergence speed and robustness across diverse optimization problems and machine learning tasks—outperforming fixed-rate, decaying, and mainstream adaptive methods (e.g., Adam, AdaGrad). The core innovation lies in formulating learning rate adaptation as a theoretically grounded online binary decision process—the first such formulation with rigorous convergence analysis.

Technology Category

Application Category

📝 Abstract
The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a non-trivial amount of user tuning effort. To address this, we introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learning rate at a given iteration and then takes appropriate action. We introduce theory supporting the convergence of AutoSGD, along with its deterministic counterpart for standard gradient descent. Empirical results suggest strong performance of the method on a variety of traditional optimization problems and machine learning tasks.
Problem

Research questions and friction points this paper is trying to address.

Automates learning rate adjustment in SGD
Reduces manual tuning effort for SGD
Ensures convergence with adaptive learning rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically adjusts SGD learning rate
Determines rate changes per iteration
Ensures convergence with theoretical support
🔎 Similar Papers
No similar papers found.
N
Nikola Surjanovic
Department of Statistics, University of British Columbia
A
Alexandre Bouchard-Cot'e
Department of Statistics, University of British Columbia
Trevor Campbell
Trevor Campbell
Associate Professor, Statistics, UBC
Machine LearningStatisticsOptimizationMathematics