Statistical Guarantees for High-Dimensional Stochastic Gradient Descent

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Constant-step-size SGD and Ruppert–Polyak averaged SGD (ASGD) lack rigorous statistical guarantees in high-dimensional settings. Method: This paper pioneers the integration of high-dimensional time series analysis into online optimization, modeling SGD as a nonlinear autoregressive process and establishing asymptotic stationarity and moment convergence of its iterate sequence. It introduces the novel notion of “geometric moment contraction” and leverages coupling techniques together with high-dimensional concentration inequalities. Contribution/Results: The work delivers the first general ℓ^s-norm q-th moment convergence guarantee for constant-step-size SGD and ASGD, and derives sharp high-probability error bounds under the ℓ^∞-norm. These results fill a critical theoretical gap in the high-dimensional statistical analysis of constant-step-size SGD and ASGD, providing a rigorous foundation for large-scale online learning.

Technology Category

Application Category

📝 Abstract
Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the $q$-th moment convergence of SGD and ASGD for any $qge2$ in general $ell^s$-norms, and, in particular, the $ell^{infty}$-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.
Problem

Research questions and friction points this paper is trying to address.

Establish statistical guarantees for high-dimensional SGD
Analyze moment convergence in various norm spaces
Provide high-probability concentration bounds for ASGD
Innovation

Methods, ideas, or system contributions that make the work stand out.

Viewing SGD as nonlinear autoregressive process
Adapting coupling techniques for geometric-moment contraction
Providing sharp high-probability concentration analysis
🔎 Similar Papers
No similar papers found.
J
Jiaqi Li
Department of Statistics, University of Chicago, Chicago, IL 60637
Z
Zhipeng Lou
Department of Mathematics, University of California, San Diego, La Jolla, CA 92093
Johannes Schmidt-Hieber
Johannes Schmidt-Hieber
University of Twente
Wei Biao Wu
Wei Biao Wu
University of Chicago
statistics