Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Stochastic Gradient Descent (SGD) lacks a reliable, time-uniform stopping criterion for convex optimization—existing rules either lack probabilistic guarantees or depend on restrictive assumptions. Method: We propose the first time-uniform ε-optimality stopping rule, constructing confidence sequences valid at *any* stopping time via nonnegative nonnegative supermartingales. Our approach integrates projected SGD, weighted-averaging suboptimality analysis, and stochastic approximation-style step sizes—requiring neither strong convexity nor smoothness. Contributions/Results: (1) We establish the first rigorous time-uniform probabilistic guarantee: ℙ(f(xₜ) − f* ≤ ε) ≥ 1 − α for *all* t, where f* is the optimal value; (2) we prove the induced stopping time is almost surely finite; (3) the rule relies solely on observable trajectory data—no additional sampling, oracle queries, or model assumptions are needed. This framework provides both theoretical foundations and practical tools for online monitoring and adaptive termination of SGD.

Technology Category

Application Category

📝 Abstract

We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $varepsilon$-optimal with probability at least $1-α$ and is almost surely finite under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.

Problem

Research questions and friction points this paper is trying to address.

Develops anytime-valid confidence sequences for SGD suboptimality.

Provides statistically valid stopping rules for convex optimization.

Ensures finite-time ε-optimality without smoothness assumptions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anytime-valid confidence sequences for SGD

Nonnegative supermartingales construct upper confidence bounds

Stopping rule ensures ε-optimality with statistical guarantees

🔎 Similar Papers

Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning