Statistical Guarantees for High-Dimensional Stochastic Gradient Descent

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Constant-step-size SGD and Ruppert–Polyak averaged SGD (ASGD) lack rigorous statistical guarantees in high-dimensional settings. Method: This paper pioneers the integration of high-dimensional time series analysis into online optimization, modeling SGD as a nonlinear autoregressive process and establishing asymptotic stationarity and moment convergence of its iterate sequence. It introduces the novel notion of “geometric moment contraction” and leverages coupling techniques together with high-dimensional concentration inequalities. Contribution/Results: The work delivers the first general ℓ^s-norm q-th moment convergence guarantee for constant-step-size SGD and ASGD, and derives sharp high-probability error bounds under the ℓ^∞-norm. These results fill a critical theoretical gap in the high-dimensional statistical analysis of constant-step-size SGD and ASGD, providing a rigorous foundation for large-scale online learning.

Technology Category

Application Category

📝 Abstract

Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the $q$-th moment convergence of SGD and ASGD for any $qge2$ in general $ell^s$-norms, and, in particular, the $ell^{infty}$-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.

Problem

Research questions and friction points this paper is trying to address.

Establish statistical guarantees for high-dimensional SGD

Analyze moment convergence in various norm spaces

Provide high-probability concentration bounds for ASGD

Innovation

Methods, ideas, or system contributions that make the work stand out.

Viewing SGD as nonlinear autoregressive process

Adapting coupling techniques for geometric-moment contraction

Providing sharp high-probability concentration analysis

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation