Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning

📅 2024-05-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses non-asymptotic statistical inference for Polyak–Ruppert averaged iterates in linear stochastic approximation (LSA), with emphasis on reliable and practical confidence interval construction in temporal-difference (TD) learning. Methodologically, we first derive the first multivariate Berry–Esseen bound for the averaged iterate, establishing an $O(1/sqrt{n})$ finite-sample convergence rate to a multivariate normal distribution. Second, we propose an online-updatable multiplier bootstrap procedure that requires no asymptotic assumptions, ensuring valid non-asymptotic confidence intervals. Our approach integrates linear function approximation within the TD learning framework, striking a balance between theoretical rigor and computational efficiency. Experiments demonstrate that the proposed method significantly outperforms standard asymptotic approaches in TD learning—achieving higher accuracy, lower computational overhead, and straightforward implementation—while providing rigorous finite-sample guarantees.

Technology Category

Application Category

📝 Abstract

In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.

Problem

Research questions and friction points this paper is trying to address.

Polyak-Ruppert Averaging

Berry-Esseen Bounds

Temporal Difference Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Approximation

Multiplier Resampling

Polyak-Ruppert Averaging

🔎 Similar Papers

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning