Towards Weaker Variance Assumptions for Stochastic Optimization

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the classical yet long-overlooked weak variance assumption in stochastic subgradient methods—namely, that the subgradient variance (or squared norm) grows at most linearly with the squared norm of the iterate. We systematically trace its historical origins and clarify its theoretical significance, uncovering a profound connection to Halpern iteration. We propose a step-size–free, anytime stochastic subgradient algorithm, achieving, for the first time in convex nonsmooth stochastic optimization, a last-iterate sublinear convergence rate. The framework is extended to convex optimization with functional constraints and regularized convex–concave minimax problems, yielding optimal convergence rates for natural optimality measures—even without bounded feasible sets. Our core contributions are threefold: (i) establishing a rigorous theoretical foundation for the weak variance assumption; (ii) bridging it with fixed-point iteration theory; and (iii) overcoming the traditional reliance on bounded variance and bounded domains.

Technology Category

Application Category

📝 Abstract
We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.
Problem

Research questions and friction points this paper is trying to address.

Analyzing stochastic gradient algorithms with weaker variance assumptions
Extending assumptions for non-Lipschitz nonsmooth convex optimization
Developing horizon-free algorithms for stochastic convex optimization problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weaker variance assumptions for stochastic optimization
Horizon-free anytime algorithms with last-iterate rates
Optimality measures without bounded feasible set
🔎 Similar Papers
No similar papers found.
Ahmet Alacaoglu
Ahmet Alacaoglu
University of British Columbia
optimizationmachine learning
Y
Yura Malitsky
Faculty of Mathematics, University of Vienna
S
Stephen J. Wright
Department of Computer Sciences, University of Wisconsin–Madison