When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold

📅 2026-04-25
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This work investigates how the stability rate of the preconditioning matrix in dynamically preconditioned stochastic approximation affects the validity of the Polyak–Ruppert central limit theorem (CLT). By introducing a precise decomposition that isolates the preconditioner’s influence, the authors decompose the averaging error into a martingale term, a Taylor remainder, and a residual term depending solely on the dynamic preconditioner. They establish, for the first time, that the CLT holds provided the stability rate satisfies β > (α+1)/2, and demonstrate that this threshold is sharp under polynomial-rate assumptions. Leveraging L² analysis in operator norm and refined estimates of the dynamic residual, the study verifies the stability of algorithms such as SA-AdaGrad, SA-RMSProp, and SA-ONS within this framework. When the stability rate exceeds the threshold, dynamically preconditioned averaged SGD satisfies the CLT and achieves a Wasserstein convergence rate of n⁻¹/⁶ under bounded inputs, offering theoretical guarantees for online statistical inference.

Technology Category

Application Category

📝 Abstract
Polyak-Ruppert averaging yields an asymptotically normal estimator with sandwich covariance $H^{-1}SH^{-1}$, the foundation of online inference. When the gradient step is preconditioned by a data-driven matrix $P_t$, we ask how fast $P_t$ must stabilize for the central limit theorem (CLT) to remain valid. We resolve this via an exact preconditioner-isolating decomposition of the averaged error that confines $P_t$ to a dynamic remainder $R_n$, leaving the martingale and Taylor terms preconditioner-free. Let $M_t = (P_t H)^{-1}$ denote the effective inverse drift matrix, with $\|M_t - M_{t-1}\|_{\mathrm{op}} \lesssim t^{-β}$ and step-size exponent $α\in (1/2, 1)$. We identify a stabilization-rate threshold $β> (α+1)/2$ and prove that, within the class of polynomial rate hypotheses used in our upper bound, it cannot be weakened: the dynamic remainder $\sqrt{n}\,R_n$ vanishes in $L^2$ whenever $β> (α+1)/2$, and we exhibit sequences satisfying those hypotheses for which it does not vanish when $β\le (α+1)/2$. A single stabilization argument certifies three SA variants - SA-AdaGrad, SA-RMSProp, and SA-ONS - with gain $ρ_t = c/t$, each delivering one-step $L^2(\mathrm{op})$ stabilization of order $t^{-1}$, yielding the CLT $\sqrt{n}(\bar{x}_n - x^*) \to N(0, H^{-1}SH^{-1})$; under bounded inputs the pathwise rate $β= 1$ further preserves the $n^{-1/6}$ Wasserstein rate at $α^* = 2/3$. Under standard regularity conditions, Wald-type online inference remains valid for dynamically preconditioned averaged SGD whose stabilization rate exceeds the threshold.
Problem

Research questions and friction points this paper is trying to address.

dynamic preconditioning
Polyak-Ruppert averaging
central limit theorem
stabilization rate
stochastic approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic preconditioning
Polyak-Ruppert averaging
central limit theorem
stabilization threshold
stochastic approximation