Lower Bounds and Proximally Anchored SGD for Non-Convex Minimization Under Unbounded Variance

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the lack of theoretical lower bounds and efficient algorithms for nonconvex stochastic optimization under unbounded variance. Focusing on settings satisfying the Blum–Gladyshev (BG) condition—where the gradient variance grows quadratically with the distance from the optimum—the paper establishes, for the first time, information-theoretic lower bounds under the BG-0 assumption: Ω(ε⁻⁶) stochastic gradient queries are necessary in the smooth case, and Ω(ε⁻⁴) in the mean-square smooth case. To match these bounds, the authors propose the Proximal Anchored Stochastic Approximation (PASTA) algorithm, which integrates Halpern anchoring with Tikhonov regularization to effectively handle unbounded domains and gradients. PASTA achieves minimax-optimal convergence rates across a broad class of nonconvex functions within a unified framework, tightly matching the established lower bounds.

Technology Category

Application Category

📝 Abstract

Analysis of Stochastic Gradient Descent (SGD) and its variants typically relies on the assumption of uniformly bounded variance, a condition that frequently fails in practical non-convex settings, such as neural network training, as well as in several elementary optimization settings. While several relaxations are explored in the literature, the Blum-Gladyshev (BG-0) condition, which permits the variance to grow quadratically with distance has recently been shown to be the weakest condition. However, the study of the oracle complexity of stochastic first-order non-convex optimization under BG-0 has remained underexplored. In this paper, we address this gap and establish information-theoretic lower bounds, proving that finding an $ε$-stationary point requires $Ω(ε^{-6})$ stochastic BG-0 oracle queries for smooth functions and $Ω(ε^{-4})$ queries under mean-square smoothness. These limits demonstrate an unavoidable degradation from classical bounded-variance complexities, i.e., $Ω(ε^{-4})$ and $Ω(ε^{-3})$ for smooth and mean-square smooth cases, respectively. To match these lower bounds, we consider Proximally Anchored STochastic Approximation (PASTA), a unified algorithmic framework that couples Halpern anchoring with Tikhonov regularization to dynamically mitigate the extra variance explosion term permitted by the BG-0 oracle. We prove that PASTA achieves minimax optimal complexities across numerous non-convex regimes, including standard smooth, mean-square smooth, weakly convex, star-convex, and Polyak-Lojasiewicz functions, entirely under an unbounded domain and unbounded stochastic gradients.

Problem

Research questions and friction points this paper is trying to address.

non-convex minimization

unbounded variance

Blum-Gladyshev condition

oracle complexity

stochastic gradient descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

unbounded variance

information-theoretic lower bounds

PASTA algorithm