Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Finite-sample theoretical analysis for policy evaluation in robust average-reward Markov decision processes (MDPs) has long been lacking. Method: This paper establishes the first rigorous sample-complexity characterization by proving that the robust Bellman operator is a contraction under the span semi-norm, and then designing a geometrically truncated multilevel Monte Carlo (MLMC) estimator based on this property. The estimator achieves exponential bias decay and controlled variance. Results: The proposed framework attains an order-optimal sample complexity of $ ilde{mathcal{O}}(epsilon^{-2})$. Unlike prior works offering only asymptotic convergence guarantees, this study fills a fundamental theoretical gap in finite-sample analysis for robust policy evaluation, thereby significantly enhancing algorithmic interpretability and practical applicability in robust reinforcement learning.

Technology Category

Application Category

📝 Abstract

We present the first finite-sample analysis for policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior works in this setting have established only asymptotic convergence guarantees, leaving open the question of sample complexity. In this work, we address this gap by establishing that the robust Bellman operator is a contraction under the span semi-norm, and developing a stochastic approximation framework with controlled bias. Our approach builds upon Multi-Level Monte Carlo (MLMC) techniques to estimate the robust Bellman operator efficiently. To overcome the infinite expected sample complexity inherent in standard MLMC, we introduce a truncation mechanism based on a geometric distribution, ensuring a finite constant sample complexity while maintaining a small bias that decays exponentially with the truncation level. Our method achieves the order-optimal sample complexity of $ ilde{mathcal{O}}(epsilon^{-2})$ for robust policy evaluation and robust average reward estimation, marking a significant advancement in robust reinforcement learning theory.

Problem

Research questions and friction points this paper is trying to address.

Finite-sample analysis for robust policy evaluation

Develops stochastic approximation with controlled bias

Achieves optimal sample complexity in robust RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust Bellman operator contraction

Multi-Level Monte Carlo techniques

Geometric distribution truncation mechanism

🔎 Similar Papers

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning