Stability and Generalization of Nonconvex Optimization with Heavy-Tailed Noise

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing theoretical frameworks struggle to characterize the generalization performance of stochastic optimization algorithms under heavy-tailed gradient noise. This work proposes a unified analytical framework that, for the first time, systematically analyzes the generalization error of clipped and normalized SGD—including their mini-batch and momentum variants—under the weak assumption that the gradient noise possesses only a bounded centered $p$-th moment with $p \in (1,2]$. By integrating truncation techniques with algorithmic stability theory, the study establishes novel stability bounds and corresponding generalization error upper bounds. These results fill a critical theoretical gap in understanding the generalization behavior of mainstream stochastic optimization methods in the presence of heavy-tailed noise.

Technology Category

Application Category

📝 Abstract

The empirical evidence indicates that stochastic optimization with heavy-tailed gradient noise is more appropriate to characterize the training of machine learning models than that with standard bounded gradient variance noise. Most existing works on this phenomenon focus on the convergence of optimization errors, while the analysis for generalization bounds under the heavy-tailed gradient noise remains limited. In this paper, we develop a general framework for establishing generalization bounds under heavy-tailed noise. Specifically, we introduce a truncation argument to achieve the generalization error bound based on the algorithmic stability under the assumption of bounded $p$th centered moment with $p\in(1,2]$. Building on this framework, we further provide the stability and generalization analysis for several popular stochastic algorithms under heavy-tailed noise, including clipped and normalized stochastic gradient descent, as well as their mini-batch and momentum variants.

Problem

Research questions and friction points this paper is trying to address.

heavy-tailed noise

generalization bounds

algorithmic stability

nonconvex optimization

stochastic gradient descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

heavy-tailed noise

algorithmic stability

generalization bounds