Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL

πŸ“… 2026-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

222K/year
πŸ€– AI Summary
This work addresses the challenges of single-sample update bias and high computational cost in model-free distributionally robust reinforcement learning, which arise from the nonlinear robust Bellman operator. Under a small-radius KL divergence ambiguity set, the authors derive an approximate robust Bellman equation via first-order Taylor expansion and propose the Mean-Variance Stochastic Approximation (MVSA) algorithm, which requires only a single sample per update. By integrating two-timescale stochastic approximation with state-space augmentation, MVSA achieves unbiased single-sample updates for the first time and comes with provable central limit theorem guarantees. Theoretical analysis shows that the main iterate sequence of MVSA converges at the standard $n^{-1/2}$ rate, with an explicitly characterizable asymptotic covariance. Numerical experiments confirm both the algorithm’s effectiveness and its alignment with theoretical predictions.
πŸ“ Abstract
Designing model-free algorithms for distributionally robust reinforcement learning (DRRL) poses fundamental challenges. The robust Bellman operator is nonlinear in the transition kernel, which makes one-sample Bellman updates biased, while the adversarial optimization underlying robustness makes robust evaluation computationally demanding. To address these difficulties, we consider the natural small-ambiguity regime under Kullback--Leibler ambiguity sets and propose an approximate DRRL framework based on a first-order expansion of the relevant robust functional. This yields an approximate robust Bellman equation that removes the adversarial optimization while remaining first-order accurate in the ambiguity radius. To learn the fixed point of this approximate equation, we propose Mean-Variance Stochastic Approximation (MVSA), a model-free algorithm that uses only one-sample updates. This is achieved via a lifted stochastic approximation dynamics and a two-time-scale design. We then prove convergence and a central limit theorem for MVSA: its main iterate satisfies a central limit theorem at the canonical $n^{-1/2}$ scale, with explicitly characterized asymptotic covariances. Finally, we validate our theoretical findings with a numerical experiment.
Problem

Research questions and friction points this paper is trying to address.

Distributionally Robust Reinforcement Learning
Robust Bellman Operator
Adversarial Optimization
Model-free Algorithm
Bias in One-sample Updates
Innovation

Methods, ideas, or system contributions that make the work stand out.

distributionally robust reinforcement learning
two-time-scale stochastic approximation
central limit theorem
approximate robust Bellman equation
mean-variance stochastic approximation