🤖 AI Summary
This paper addresses the problem of solving fixed-point equations under seminorm-contractive operators, focusing on non-asymptotic convergence of deterministic and stochastic iterative algorithms. Methodologically, it introduces, for the first time, a non-asymptotic Lyapunov stability theory in the seminorm sense, integrating contraction analysis, Markov noise modeling, Poisson equation techniques, and Lyapunov matrix equation solvers. The theoretical contributions are threefold: (1) establishing geometric convergence rates for deterministic iterations over the kernel of the seminorm; (2) removing the restrictive Hurwitz condition to derive finite-sample error bounds for linear stochastic approximation under Markov noise; and (3) establishing an exact correspondence between seminorm-based stability and solutions of the Lyapunov matrix equation. These results provide the first unified, non-asymptotic, and computable convergence guarantees for average-reward reinforcement learning algorithms—including TD(λ) and Q-learning.
📝 Abstract
We study the problem of solving fixed-point equations for seminorm-contractive operators and establish foundational results on the non-asymptotic behavior of iterative algorithms in both deterministic and stochastic settings. Specifically, in the deterministic setting, we prove a fixed-point theorem for seminorm-contractive operators, showing that iterates converge geometrically to the kernel of the seminorm. In the stochastic setting, we analyze the corresponding stochastic approximation (SA) algorithm under seminorm-contractive operators and Markovian noise, providing a finite-sample analysis for various stepsize choices. A benchmark for equation solving is linear systems of equations, where the convergence behavior of fixed-point iteration is closely tied to the stability of linear dynamical systems. In this special case, our results provide a complete characterization of system stability with respect to a seminorm, linking it to the solution of a Lyapunov equation in terms of positive semi-definite matrices. In the stochastic setting, we establish a finite-sample analysis for linear Markovian SA without requiring the Hurwitzness assumption. Our theoretical results offer a unified framework for deriving finite-sample bounds for various reinforcement learning algorithms in the average reward setting, including TD($lambda$) for policy evaluation (which is a special case of solving a Poisson equation) and Q-learning for control.