🤖 AI Summary
This paper studies optimal first-order algorithms for convex composite optimization under heavy-tailed stochastic noise, where the objective comprises a proximable term plus a convex function whose subgradients are corrupted by heavy-tailed noise. Departing from mainstream robustification strategies—such as gradient clipping or normalization—the work establishes, for the first time, that the standard accelerated stochastic proximal subgradient method (without any modification) achieves universal optimal oracle complexity: $O(1/varepsilon^2)$ for nonsmooth objectives and $O(1/varepsilon)$ for smooth ones, across smooth, weakly smooth, and nonsmooth settings. The analysis integrates probabilistic inequalities with convex optimization structure to rigorously characterize convergence rates under heavy-tailed noise. Numerical experiments confirm both robustness and efficiency. Crucially, this work eliminates the need for auxiliary regularization or preprocessing steps previously deemed necessary in heavy-tailed settings, thereby providing a simpler, more fundamental theoretical foundation for robust stochastic optimization.
📝 Abstract
We study convex composite optimization problems, where the objective function is given by the sum of a prox-friendly function and a convex function whose subgradients are estimated under heavy-tailed noise. Existing work often employs gradient clipping or normalization techniques in stochastic first-order methods to address heavy-tailed noise. In this paper, we demonstrate that a vanilla stochastic algorithm -- without additional modifications such as clipping or normalization -- can achieve optimal complexity for these problems. In particular, we establish that an accelerated stochastic proximal subgradient method achieves a first-order oracle complexity that is universally optimal for smooth, weakly smooth, and nonsmooth convex optimization, as well as for stochastic convex optimization under heavy-tailed noise. Numerical experiments are further provided to validate our theoretical results.