Gaussian Approximation for Asynchronous Q-learning

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the finite-sample convergence and distributional approximation of asynchronous Q-learning in high-dimensional settings. Under the assumption that the state-action-next-state sequence forms a uniformly geometrically ergodic Markov chain, the authors establish—using polynomial step sizes and Polyak–Ruppert averaging—the first convergence rate of order \(n^{-1/6} \log^4(nSA)\) over high-dimensional hyperrectangles. They further prove a high-dimensional central limit theorem for the averaged iterates, providing explicit Gaussian approximation error bounds, and derive higher-order moment upper bounds for the final iterate. By integrating Markov chain ergodic theory, high-dimensional probabilistic limit theorems, and martingale difference analysis, this work lays a rigorous theoretical foundation for statistical inference in asynchronous reinforcement learning.
📝 Abstract
In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize $k^{-ω},\, ω\in (1/2, 1]$. Assuming that the sequence of state-action-next-state triples $(s_k, a_k, s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, we establish a rate of order up to $n^{-1/6} \log^{4} (nS A)$ over the class of hyper-rectangles, where $n$ is the number of samples used by the algorithm and $S$ and $A$ denote the numbers of states and actions, respectively. To obtain this result, we prove a high-dimensional central limit theorem for sums of martingale differences, which may be of independent interest. Finally, we present bounds for high-order moments for the algorithm's last iterate.
Problem

Research questions and friction points this paper is trying to address.

asynchronous Q-learning
central limit theorem
high-dimensional convergence
Polyak-Ruppert averaging
martingale differences
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous Q-learning
high-dimensional central limit theorem
Polyak-Ruppert averaging
martingale difference
convergence rate
🔎 Similar Papers
No similar papers found.