🤖 AI Summary
This work addresses the long-standing challenge of characterizing almost sure convergence rates for stochastic approximation and reinforcement learning algorithms under Markovian noise. We propose a novel Lyapunov drift analysis framework that integrates Poisson equation-based noise correction with Moreau envelope smoothing, applicable to algorithms whose expected updates exhibit contractivity. Within this framework, we establish— for the first time under Markov-dependent noise—nearly optimal almost sure convergence rates: for polynomially decaying step sizes, the rate approaches \(o(n^{1-2\eta})\), and for harmonic step sizes, it approaches \(o(n^{-1})\), both closely matching the theoretical optima known in the i.i.d. noise setting.
📝 Abstract
Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose expected updates are contractive, a setting that arises in many reinforcement learning algorithms such as $Q$-learning and linear temporal difference learning. Specifically, for a power-law learning rate $O(n^{-η})$ with $η\in (1/2, 1)$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{1 - 2η})$. For a harmonic learning rate $O(n^{-1})$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{-1})$, which we argue is a strong result because it is close to the optimal rate $O(n^{-1}\log\log n)$ given by the law of the iterated logarithm (for a special case of i.i.d. noise). Key to our analysis is a novel Lyapunov drift construction that applies a Poisson-equation based correction for Markovian noise to the well-established Moreau-envelope smoothing for the contractive mapping.