Gaussian Approximation for Asynchronous Q-learning

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study investigates the finite-sample convergence and distributional approximation of asynchronous Q-learning in high-dimensional settings. Under the assumption that the state-action-next-state sequence forms a uniformly geometrically ergodic Markov chain, the authors establish—using polynomial step sizes and Polyak–Ruppert averaging—the first convergence rate of order $n^{-1/6} \log^4(nSA)$ over high-dimensional hyperrectangles. They further prove a high-dimensional central limit theorem for the averaged iterates, providing explicit Gaussian approximation error bounds, and derive higher-order moment upper bounds for the final iterate. By integrating Markov chain ergodic theory, high-dimensional probabilistic limit theorems, and martingale difference analysis, this work lays a rigorous theoretical foundation for statistical inference in asynchronous reinforcement learning.

Technology Category

Application Category

📝 Abstract

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize $k^{-ω},\, ω\in (1/2, 1]$. Assuming that the sequence of state-action-next-state triples $(s_k, a_k, s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, we establish a rate of order up to $n^{-1/6} \log^{4} (nS A)$ over the class of hyper-rectangles, where $n$ is the number of samples used by the algorithm and $S$ and $A$ denote the numbers of states and actions, respectively. To obtain this result, we prove a high-dimensional central limit theorem for sums of martingale differences, which may be of independent interest. Finally, we present bounds for high-order moments for the algorithm's last iterate.

Problem

Research questions and friction points this paper is trying to address.

asynchronous Q-learning

central limit theorem

high-dimensional convergence

Polyak-Ruppert averaging

martingale differences

Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous Q-learning

high-dimensional central limit theorem

Polyak-Ruppert averaging