Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses high-probability convergence of online learning under heavy-tailed noise. Existing stochastic gradient descent (SGD) methods fail without finite-order moment assumptions on the noise. To overcome this, we propose a unified nonlinear SGD framework that accommodates black-box nonlinear transformations—including sign, quantization, and component-wise or joint clipping. For the first time, we establish general high-probability convergence guarantees *without any moment assumptions* (e.g., finite $p$-th moment), thereby removing reliance on symmetry or zero-mean conditions required by conventional clipping methods and enabling robustness to biased, mixture-type heavy-tailed noise. Our analysis yields constant-factor exponential convergence rates. Specifically, under nonconvex objectives, the squared gradient norm converges at rate $ ilde{O}(t^{-1/4})$; under strong convexity, the final iterate achieves $O(t^{-zeta})$ for any $zeta in (0,1)$. Empirical results confirm that the choice of nonlinear operator significantly impacts performance—and clipping is suboptimal.

Technology Category

Application Category

📝 Abstract
We study high-probability convergence in online learning, in the presence of heavy-tailed noise. To combat the heavy tails, a general framework of nonlinear SGD methods is considered, subsuming several popular nonlinearities like sign, quantization, component-wise and joint clipping. In our work the nonlinearity is treated in a black-box manner, allowing us to establish unified guarantees for a broad range of nonlinear methods. For symmetric noise and non-convex costs we establish convergence of gradient norm-squared, at a rate $widetilde{mathcal{O}}(t^{-1/4})$, while for the last iterate of strongly convex costs we establish convergence to the population optima, at a rate $mathcal{O}(t^{-zeta})$, where $zeta in (0,1)$ depends on noise and problem parameters. Further, if the noise is a (biased) mixture of symmetric and non-symmetric components, we show convergence to a neighbourhood of stationarity, whose size depends on the mixture coefficient, nonlinearity and noise. Compared to state-of-the-art, who only consider clipping and require unbiased noise with bounded $p$-th moments, $p in (1,2]$, we provide guarantees for a broad class of nonlinearities, without any assumptions on noise moments. While the rate exponents in state-of-the-art depend on noise moments and vanish as $p ightarrow 1$, our exponents are constant and strictly better whenever $p<6/5$ for non-convex and $p<8/7$ for strongly convex costs. Experiments validate our theory, showing that clipping is not always the optimal nonlinearity, further underlining the value of a general framework.
Problem

Research questions and friction points this paper is trying to address.

Study high-probability convergence in online learning with heavy-tailed noise
Develop a unified framework for nonlinear SGD methods handling various nonlinearities
Establish convergence guarantees for non-convex and strongly convex costs under symmetric and mixed noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for nonlinear SGD methods
Black-box treatment of nonlinearities
High-probability convergence guarantees
🔎 Similar Papers
No similar papers found.
A
Aleksandar Armacki
Carnegie Mellon University, Pittsburgh, PA, USA
S
Shuhua Yu
Carnegie Mellon University, Pittsburgh, PA, USA
Pranay Sharma
Pranay Sharma
C-MInDS, IIT Bombay
Machine LearningStochastic OptimizationNonconvex OptimizationMin-max Optimization
Gauri Joshi
Gauri Joshi
Carnegie Mellon University
applied probabilitymachine learningoptimizationinformation theory
D
D. Bajović
Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
D
D. Jakovetić
Faculty of Sciences, University of Novi Sad, Novi Sad, Serbia
S
S. Kar
Carnegie Mellon University, Pittsburgh, PA, USA