🤖 AI Summary
This paper studies safe robust policy learning in robust constrained Markov decision processes (RCMDPs) with average-cost criteria. Addressing two fundamental theoretical challenges—the absence of strong duality and the non-contractivity of the robust Bellman operator—we propose the first primal-only actor-critic algorithm that avoids any primal-dual structure: it eschews Lagrangian duality entirely, instead leveraging policy gradients and robust dynamic programming, augmented by monotonicity analysis to handle non-contractivity. Our method is the first to achieve simultaneous ε-feasibility and ε-optimality guarantees in average-cost RCMDPs. Under constraint relaxation, the sample complexity is Õ(ε⁻⁴); without relaxation, it is Õ(ε⁻⁶)—matching the best-known rates for discounted RCMDPs.
📝 Abstract
In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of standard primal-dual methods for constrained RL. Additional difficulties arise from the average-cost setting, where the Robust Bellman operator is not a contraction under any norm. To address these challenges, we propose an actor-critic algorithm for Average-Cost RCMDPs. We show that our method achieves both (epsilon)-feasibility and (epsilon)-optimality, and we establish a sample complexities of ( ilde{O}left(epsilon^{-4}
ight)) and ( ilde{O}left(epsilon^{-6}
ight)) with and without slackness assumption, which is comparable to the discounted setting.