When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold

📅 2026-04-25

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work investigates how the stability rate of the preconditioning matrix in dynamically preconditioned stochastic approximation affects the validity of the Polyak–Ruppert central limit theorem (CLT). By introducing a precise decomposition that isolates the preconditioner’s influence, the authors decompose the averaging error into a martingale term, a Taylor remainder, and a residual term depending solely on the dynamic preconditioner. They establish, for the first time, that the CLT holds provided the stability rate satisfies β > (α+1)/2, and demonstrate that this threshold is sharp under polynomial-rate assumptions. Leveraging L² analysis in operator norm and refined estimates of the dynamic residual, the study verifies the stability of algorithms such as SA-AdaGrad, SA-RMSProp, and SA-ONS within this framework. When the stability rate exceeds the threshold, dynamically preconditioned averaged SGD satisfies the CLT and achieves a Wasserstein convergence rate of n⁻¹/⁶ under bounded inputs, offering theoretical guarantees for online statistical inference.

Technology Category

Application Category

📝 Abstract

Polyak-Ruppert averaging yields an asymptotically normal estimator with sandwich covariance $H^{-1}SH^{-1}$, the foundation of online inference. When the gradient step is preconditioned by a data-driven matrix $P_t$, we ask how fast $P_t$ must stabilize for the central limit theorem (CLT) to remain valid. We resolve this via an exact preconditioner-isolating decomposition of the averaged error that confines $P_t$ to a dynamic remainder $R_n$, leaving the martingale and Taylor terms preconditioner-free. Let $M_t = (P_t H)^{-1}$ denote the effective inverse drift matrix, with $\|M_t - M_{t-1}\|_{\mathrm{op}} \lesssim t^{-β}$ and step-size exponent $α\in (1/2, 1)$. We identify a stabilization-rate threshold $β> (α+1)/2$ and prove that, within the class of polynomial rate hypotheses used in our upper bound, it cannot be weakened: the dynamic remainder $\sqrt{n}\,R_n$ vanishes in $L^2$ whenever $β> (α+1)/2$, and we exhibit sequences satisfying those hypotheses for which it does not vanish when $β\le (α+1)/2$. A single stabilization argument certifies three SA variants - SA-AdaGrad, SA-RMSProp, and SA-ONS - with gain $ρ_t = c/t$, each delivering one-step $L^2(\mathrm{op})$ stabilization of order $t^{-1}$, yielding the CLT $\sqrt{n}(\bar{x}_n - x^*) \to N(0, H^{-1}SH^{-1})$; under bounded inputs the pathwise rate $β= 1$ further preserves the $n^{-1/6}$ Wasserstein rate at $α^* = 2/3$. Under standard regularity conditions, Wald-type online inference remains valid for dynamically preconditioned averaged SGD whose stabilization rate exceeds the threshold.

Problem

Research questions and friction points this paper is trying to address.

dynamic preconditioning

Polyak-Ruppert averaging

central limit theorem

stabilization rate

stochastic approximation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic preconditioning

Polyak-Ruppert averaging

central limit theorem