Stochastic Adaptive Gradient Descent Without Descent

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

To address the reliance of stochastic gradient descent (SGD) on manually tuned step sizes in convex stochastic optimization, this paper proposes a fully hyperparameter-free adaptive step-size strategy. The method accesses only first-order stochastic gradient information via an oracle and dynamically captures the local geometric structure of the objective function—marking the first successful extension of “descent-free” adaptive gradient ideas to the stochastic optimization setting. Theoretically, under standard convexity and gradient variance assumptions, we establish a rigorous $O(1/sqrt{T})$ convergence rate. Empirically, the method matches the performance of carefully tuned baseline algorithms across diverse tasks—including logistic regression, neural network training, and robust optimization—while requiring zero hyperparameter configuration. This eliminates manual tuning overhead, significantly enhancing algorithmic robustness, reproducibility, and practical applicability.

Technology Category

Application Category

📝 Abstract

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

Problem

Research questions and friction points this paper is trying to address.

Introduces adaptive step-size for stochastic convex optimization

Uses first-order oracle without hyper-parameter tuning

Proves convergence and competes against tuned baselines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic adaptive step-size strategy

No hyper-parameter tuning required

First-order stochastic oracle utilization

🔎 Similar Papers

GeoAdaLer: Geometric Insights into Adaptive Stochastic Gradient Descent Algorithms