Polynomial-Time Approximability of Constrained Reinforcement Learning

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This paper addresses the polynomial-time approximate solution of general constrained Markov decision processes (CMDPs), focusing on optimal policy computation under recursively computable constraints—including almost-sure, chance, expected-cost, and non-homogeneous mixed constraints. We propose a unified framework integrating linear programming relaxation, dual feasibility analysis, and constraint decomposition. This yields the first ((0,varepsilon))-additive bi-criteria approximation algorithm, applicable to both discrete and continuous state spaces, stochastic or deterministic policies, and arbitrary combinations of constraint types. We prove that the algorithm achieves optimal approximation ratio in polynomial time and establish a matching NP-hardness lower bound. Consequently, our approach resolves several long-standing open problems—including chance-constrained CMDPs, multi-objective expected-cost constraints, and non-homogeneous mixed constraints—achieving theoretical optimality under the assumption ( mathrm{P} eq mathrm{NP} ).

Technology Category

Application Category

📝 Abstract

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P eq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

Problem

Research questions and friction points this paper is trying to address.

Computational complexity of constrained Markov decision processes

Polynomial-time approximation for optimal constrained policies

Addressing long-standing complexity questions in reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Polynomial-time algorithm for constrained MDPs

Handles diverse recursive constraints efficiently

Solves long-standing complexity questions in RL

🔎 Similar Papers

No similar papers found.