On Gaussian approximation for entropy-regularized Q-learning with function approximation

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work investigates the convergence and distributional properties of entropy-regularized asynchronous Q-learning under high-dimensional linear function approximation. Under the assumption of a uniformly geometrically ergodic Markov chain, the authors combine a linearization of the soft Bellman equation, martingale difference decomposition, and Polyak–Ruppert averaging to establish, for the first time, a Gaussian approximation bound in convex distance for this algorithm and derive higher-order moment bounds for the final iterate. The main contributions include a central limit theorem tailored to entropy-regularized Q-learning with linear approximation and a Gaussian approximation convergence rate of $O(n^{-1/4})$ (up to logarithmic factors) based on $n$ samples.

📝 Abstract

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-ω}$, $ω\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up to polylogarithmic factors in $n$, where $n$ is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's last iterate, which might be of independent interest.

Problem

Research questions and friction points this paper is trying to address.

entropy-regularized Q-learning

Gaussian approximation

central limit theorem

function approximation

convergence rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy-regularized Q-learning

Gaussian approximation

central limit theorem