Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the failure of conventional statistical inference under stochastic gradients with infinite variance, where standard methods break down due to dependence on unknown nuisance parameters. The authors propose a model-free self-normalization approach that constructs confidence regions based on the joint weak convergence of Polyak–Ruppert averages and empirical second-moment normalizers, with critical values calibrated via subsampling. Notably, the method avoids estimating tail indices, slowly varying functions, or stable law parameters, thereby establishing the first unified asymptotically valid inference framework applicable to both finite- and infinite-variance settings. Numerical experiments demonstrate that the resulting confidence regions achieve accurate coverage and exhibit strong practical performance.
📝 Abstract
Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.
Problem

Research questions and friction points this paper is trying to address.

statistical inference
stochastic gradient descent
infinite variance
confidence regions
uncertainty quantification
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-normalized statistic
infinite variance
stochastic gradient descent
subsampling calibration
joint weak convergence