Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Second-order optimization lacks theoretical foundations under heavy-tailed noise, where gradients and Hessians possess only finite $p$-th moments for $p in (1,2]$. Method: We propose the first normalized stochastic gradient descent algorithm that jointly clips both gradients and Hessians—introducing Hessian clipping as a novel technique—to overcome robustness bottlenecks of existing methods. Contribution/Results: We establish the first tight lower bound on sample complexity under $p$-th moment assumptions. Our algorithm achieves optimal convergence rate with high probability, and its sample complexity matches this lower bound—significantly outperforming conventional second-order methods. This work fills a fundamental gap in the theory of second-order optimization under heavy-tailed noise and provides a new paradigm for robust optimization in non-stationary, outlier-contaminated, or otherwise unstable stochastic environments.

Technology Category

Application Category

📝 Abstract

Heavy-tailed noise is pervasive in modern machine learning applications, arising from data heterogeneity, outliers, and non-stationary stochastic environments. While second-order methods can significantly accelerate convergence in light-tailed or bounded-noise settings, such algorithms are often brittle and lack guarantees under heavy-tailed noise -- precisely the regimes where robustness is most critical. In this work, we take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise. We consider a setting where stochastic gradients and Hessians have only bounded $p$-th moments, for some $pin (1,2]$, and establish tight lower bounds on the sample complexity of any second-order method. We then develop a variant of normalized stochastic gradient descent that leverages second-order information and provably matches these lower bounds. To address the instability caused by large deviations, we introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits. Our results provide the first comprehensive sample complexity characterization for second-order optimization under heavy-tailed noise. This positions Hessian clipping as a robust and theoretically sound strategy for second-order algorithm design in heavy-tailed regimes.

Problem

Research questions and friction points this paper is trying to address.

Characterizing sample complexity limits for second-order optimization under heavy-tailed noise

Developing robust algorithms using gradient and Hessian clipping techniques

Establishing theoretical guarantees for second-order methods with bounded moments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hessian clipping addresses heavy-tailed noise instability

Algorithm matches tight sample complexity lower bounds

Normalized SGD leverages second-order information efficiently

🔎 Similar Papers

No similar papers found.