Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies online convex optimization (OCO) under heavy-tailed gradient noise—where gradients possess only finite $p$-th central moments for $p in (1,2]$. Addressing key limitations of prior work—including lack of theoretical guarantees, reliance on gradient clipping or known moment parameters—we propose the first analysis that achieves *fully optimal regret bounds* without modifying classical algorithms (e.g., online gradient descent, optimistic OGD). Our bounds are adaptive to unknown moment parameters, require no gradient clipping, and apply to nonsmooth—and even *nonconvex*—objective functions. The analysis assumes only standard bounded-domain conditions and unifies smooth and nonsmooth settings. To our knowledge, this is the first work to provide provable convergence guarantees for nonconvex OCO under heavy-tailed noise, thereby significantly broadening the theoretical foundation of OCO in robust learning, financial modeling, and other high-noise domains.

Technology Category

Application Category

📝 Abstract
In Online Convex Optimization (OCO), when the stochastic gradient has a finite variance, many algorithms provably work and guarantee a sublinear regret. However, limited results are known if the gradient estimate has a heavy tail, i.e., the stochastic gradient only admits a finite $mathsf{p}$-th central moment for some $mathsf{p}inleft(1,2 ight]$. Motivated by it, this work examines different old algorithms for OCO (e.g., Online Gradient Descent) in the more challenging heavy-tailed setting. Under the standard bounded domain assumption, we establish new regrets for these classical methods without any algorithmic modification. Remarkably, these regret bounds are fully optimal in all parameters (can be achieved even without knowing $mathsf{p}$), suggesting that OCO with heavy tails can be solved effectively without any extra operation (e.g., gradient clipping). Our new results have several applications. A particularly interesting one is the first provable convergence result for nonsmooth nonconvex optimization under heavy-tailed noise without gradient clipping. Furthermore, we explore broader settings (e.g., smooth OCO) and extend our ideas to optimistic algorithms to handle different cases simultaneously.
Problem

Research questions and friction points this paper is trying to address.

Optimizing online convex functions with heavy-tailed gradient noise
Analyzing classical algorithms without modifications for heavy-tailed settings
Establishing optimal regret bounds without gradient clipping operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Old algorithms for heavy-tailed gradients
Optimal regrets without any modifications
No gradient clipping for nonsmooth nonconvex optimization
🔎 Similar Papers
No similar papers found.