🤖 AI Summary
This paper investigates the convergence of TD(0) under Markovian data, removing classical assumptions on linear function approximation, instance-dependent step sizes, and exponential mixing. Methodologically, it introduces a unified analytical framework grounded in generalized gradients and Hölder continuity. For the first time, it establishes almost-sure convergence of TD(0) for both linear and nonlinear function approximation under generic (instance-independent), non-decaying step sizes and polynomially ergodic Markov chains. The analysis integrates stochastic approximation theory, ergodicity analysis of Markov chains, and functional analysis to derive finite-sample, high-probability convergence bounds. These results provide the first rigorous theoretical guarantee for nonlinear TD methods, significantly enhancing the applicability and reliability of TD learning in weakly mixing real-world settings—such as recommender systems and financial time series—where strong mixing conditions fail.
📝 Abstract
Theoretical work on Temporal Difference (TD) learning has provided finite-sample and high-probability guarantees for data generated from Markov chains. However, these bounds typically require linear function approximation, instance-dependent step sizes, algorithmic modifications, and restrictive mixing rates. We present theoretical findings for TD learning under more applicable assumptions, including instance-independent step sizes, full data utilization, and polynomial ergodicity, applicable to both linear and non-linear functions. extbf{To our knowledge, this is the first proof of TD(0) convergence on Markov data under universal and instance-independent step sizes.} While each contribution is significant on its own, their combination allows these bounds to be effectively utilized in practical application settings. Our results include bounds for linear models and non-linear under generalized gradients and H""older continuity.