Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of high-probability convergence guarantees for stochastic subspace methods in non-convex optimization under heavy-tailed noise. To this end, we propose the Direction-Normalized Stochastic Subspace SGD (RS-NSGD) algorithm, which achieves both expected and high-probability convergence under the mild assumption that gradients possess bounded p-th moments. Notably, our method significantly reduces oracle complexity compared to existing approaches. As the first to establish high-probability convergence bounds for stochastic subspace SGD under sub-Gaussian noise, this paper further demonstrates that RS-NSGD attains superior oracle complexity relative to full-dimensional normalized SGD in heavy-tailed settings, thereby highlighting its theoretical and practical advantages.

Technology Category

Application Category

📝 Abstract
Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$-th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.
Problem

Research questions and friction points this paper is trying to address.

nonconvex optimization
heavy-tailed noise
randomized subspace methods
high-probability convergence
stochastic gradient descent
Innovation

Methods, ideas, or system contributions that make the work stand out.

randomized subspace
normalized SGD
heavy-tailed noise
high-probability convergence
oracle complexity
🔎 Similar Papers
No similar papers found.
G
Gaku Omiya
Department of Mathematical Informatics, The University of Tokyo, Tokyo, Japan; Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
Pierre-Louis Poirion
Pierre-Louis Poirion
RIKEN
Mathematical OptimizationO.R.
A
Akiko Takeda
Department of Mathematical Informatics, The University of Tokyo, Tokyo, Japan; Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan