Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the long-run average-cost reinforcement learning problem with inequality constraints. We propose, for the first time, a natural actor–critic algorithm based on function approximation and establish its finite-time convergence guarantee—filling a critical gap, as prior work lacks non-asymptotic convergence guarantees under both average-cost criteria and constraint satisfaction. Our method integrates two-timescale stochastic approximation, linear function approximation, and projected gradient-based constrained optimization to jointly update policy and value function estimates. We derive theoretically grounded optimal step-size schedules and provide an improved sample complexity bound. Empirical evaluation on Safety-Gym demonstrates competitive performance against state-of-the-art constrained RL algorithms. The core contribution is the first non-asymptotic convergence analysis framework for natural actor–critic methods under constrained average-cost settings, combining rigorous theoretical foundations with practical implementability.

Technology Category

Application Category

📝 Abstract
Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular representation, where the usual roles of the actor and critic are reversed. However, only asymptotic convergence was established there. Subsequently, both asymptotic and non-asymptotic analyses of the critic-actor algorithm with linear function approximation were conducted. In our work, we introduce the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints. We provide the non-asymptotic convergence guarantees for this algorithm. Our analysis establishes optimal learning rates and we also propose a modification to enhance sample complexity. We further show the results of experiments on three different Safety-Gym environments where our algorithm is found to be competitive in comparison with other well known algorithms.
Problem

Research questions and friction points this paper is trying to address.

Develops natural critic-actor algorithm for constrained average-cost problems
Establishes non-asymptotic convergence guarantees with optimal learning rates
Enhances sample complexity for reinforcement learning with constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Natural critic-actor algorithm with function approximation
Non-asymptotic convergence guarantees with optimal rates
Modified algorithm to enhance sample complexity
P
Prashansa Panda
Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
Shalabh Bhatnagar
Shalabh Bhatnagar
Professor in the Department of Computer Science and Automation, Indian Institute of Science
Stochastic systemscontrolsimulationoptimization