🤖 AI Summary
This paper addresses the convergence analysis challenge of non-expansive stochastic approximation algorithms under Markovian noise—particularly in critical settings such as average-reward reinforcement learning, where standard contraction assumptions fail. Methodologically, it introduces, for the first time, a tight bound technique for the noise term based on the Poisson equation, integrating non-expansive operator analysis, Markov chain stability theory, and stochastic approximation theory. Key contributions include: (1) the first rigorous proof of almost-sure convergence of average-reward TD learning to a sample-path-dependent fixed point without contraction; (2) the first tight finite-sample error upper bound for such non-expansive algorithms; and (3) a unified analytical framework—both asymptotic and non-asymptotic—that overcomes the contraction restriction, applicable to a broad class of non-contractive RL algorithms.
📝 Abstract
Stochastic approximation is an important class of algorithms, and a large body of previous analysis focuses on stochastic approximations driven by contractive operators, which is not applicable in some important reinforcement learning settings. This work instead investigates stochastic approximations with merely nonexpansive operators. In particular, we study nonexpansive stochastic approximations with Markovian noise, providing both asymptotic and finite sample analysis. Key to our analysis are a few novel bounds of noise terms resulting from the Poisson equation. As an application, we prove, for the first time, that the classical tabular average reward temporal difference learning converges to a sample path dependent fixed point.