Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise

๐Ÿ“… 2025-03-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the finite-time mean-square error analysis of two-timescale stochastic approximation (SA) algorithms under arbitrary norm contraction mappings and Markovian noiseโ€”extending beyond prior limitations restricted to Euclidean norms and i.i.d. noise. We introduce a novel technical framework integrating generalized Moreau envelopes, Poisson equation-based bias correction, two-timescale iterative analysis, and Polyak averaging. This yields the first unified finite-time mean-square error bound for nonlinear two-timescale SA under arbitrary norms: $O(n^{-2/3})$ in general settings, and $O(n^{-1})$ when the slower timescale is noise-free. The bound achieves $O(n^{-1})$ convergence in both SSP Q-learning (under average-reward criteria) and Polyak-averaged Q-learning, and delivers $O(n^{-2/3})$ for computing generalized Nash equilibria in strongly monotone games. Our results comprehensively cover core reinforcement learning scenarios, including asynchronous MDP control, Q-learning, and game-theoretic equilibrium computation.

Technology Category

Application Category

๐Ÿ“ Abstract
Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings contractive under Euclidean norm. Motivated by applications in reinforcement learning, we give the first mean square bound on non linear two-time-scale SA where the iterations have arbitrary norm contractive mappings and Markovian noise. We show that the mean square error decays at a rate of $O(1/n^{2/3})$ in the general case, and at a rate of $O(1/n)$ in a special case where the slower timescale is noiseless. Our analysis uses the generalized Moreau envelope to handle the arbitrary norm contractions and solutions of Poisson equation to deal with the Markovian noise. By analyzing the SSP Q-Learning algorithm, we give the first $O(1/n)$ bound for an algorithm for asynchronous control of MDPs under the average reward criterion. We also obtain a rate of $O(1/n)$ for Q-Learning with Polyak-averaging and provide an algorithm for learning Generalized Nash Equilibrium (GNE) for strongly monotone games which converges at a rate of $O(1/n^{2/3})$.
Problem

Research questions and friction points this paper is trying to address.

Analyzes two-time-scale SA with arbitrary norm contractions and Markovian noise
Provides first mean square bounds for nonlinear two-time-scale SA
Applies analysis to reinforcement learning and optimization algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Handles arbitrary norm contractions with Moreau envelope
Uses Poisson equation for Markovian noise analysis
Achieves O(1/n) rate for noiseless slower timescale
๐Ÿ”Ž Similar Papers
No similar papers found.
Siddharth Chandak
Siddharth Chandak
Stanford University
Multi-Agent LearningReinforcement LearningGame TheoryStochastic Approximation
S
Shaan Ul Haque
Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA
N
Nicholas Bambos
Department of Electrical Engineering, Stanford University, USA