Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of directly applying the stochastic Polyak stepsize (SPS) to nonsmooth optimization—particularly in nonconvex settings—where standard SPS fails due to numerical instability and lack of theoretical guarantees. We propose SPS$_{safe}$, a stochastic subgradient method incorporating both a safety mechanism and momentum acceleration. Unlike prior SPS variants, SPS$_{safe}$ requires no interpolation assumption, knowledge of the optimal objective value, or strong convexity, thus extending SPS for the first time to general nonsmooth convex and nonconvex optimization. Theoretically, under standard subgradient boundedness assumptions, SPS$_{safe}$ enjoys stable convergence. Empirically, it effectively suppresses stepsize oscillations and numerical instability arising from small gradients, significantly reducing iteration variance. In deep neural network training, SPS$_{safe}$ mitigates vanishing gradients, achieves faster convergence, and demonstrates superior robustness compared to mainstream adaptive methods such as Adam and AdaGrad.

Technology Category

Application Category

📝 Abstract

The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. Comprehensive experiments on convex benchmarks and deep neural networks corroborate our theory: the proposed step size accelerates convergence, reduces variance, and consistently outperforms existing adaptive baselines. Finally, in the context of deep neural network training, our method demonstrates robust performance by addressing the vanishing gradient problem.

Problem

Research questions and friction points this paper is trying to address.

Extends stochastic Polyak step sizes to non-smooth convex optimization

Provides convergence guarantees without strong assumptions or optimal solution knowledge

Addresses vanishing gradient problem in deep neural network training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Safeguarded SPS variant for non-smooth optimization

Incorporates momentum with tight theoretical guarantees

Addresses vanishing gradient problem in deep networks

🔎 Similar Papers

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance