Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This paper studies safe robust policy learning in robust constrained Markov decision processes (RCMDPs) with average-cost criteria. Addressing two fundamental theoretical challenges—the absence of strong duality and the non-contractivity of the robust Bellman operator—we propose the first primal-only actor-critic algorithm that avoids any primal-dual structure: it eschews Lagrangian duality entirely, instead leveraging policy gradients and robust dynamic programming, augmented by monotonicity analysis to handle non-contractivity. Our method is the first to achieve simultaneous ε-feasibility and ε-optimality guarantees in average-cost RCMDPs. Under constraint relaxation, the sample complexity is Õ(ε⁻⁴); without relaxation, it is Õ(ε⁻⁶)—matching the best-known rates for discounted RCMDPs.

Technology Category

Application Category

📝 Abstract

In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of standard primal-dual methods for constrained RL. Additional difficulties arise from the average-cost setting, where the Robust Bellman operator is not a contraction under any norm. To address these challenges, we propose an actor-critic algorithm for Average-Cost RCMDPs. We show that our method achieves both (epsilon)-feasibility and (epsilon)-optimality, and we establish a sample complexities of ( ilde{O}left(epsilon^{-4} ight)) and ( ilde{O}left(epsilon^{-6} ight)) with and without slackness assumption, which is comparable to the discounted setting.

Problem

Research questions and friction points this paper is trying to address.

Develops robust constrained average-cost MDP algorithm

Addresses lack of strong duality in constrained RL

Solves non-contraction Robust Bellman operator issue

Innovation

Methods, ideas, or system contributions that make the work stand out.

Actor-critic algorithm for average-cost RCMDPs

Achieves epsilon-feasibility and epsilon-optimality

Sample complexity comparable to discounted setting

🔎 Similar Papers

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs