๐ค AI Summary
This work investigates the sensitivity of optimal value functions and policy regret in constrained Markov decision processes (CMDPs) to perturbations in the initial state distributionโa critical robustness concern. To address this, we propose a novel theoretical framework unifying Lagrangian duality with linear programming perturbation analysis. Our main contribution is the first derivation of tight upper and lower bounds on the optimal value as a function of initial distribution shifts. These bounds explicitly characterize the influence of constraint violation magnitude, discount factor, and transition structure on distributional sensitivity. Leveraging these bounds, we develop a computationally tractable method for quantifying policy regret under distributional shift. Empirical evaluation demonstrates that our bounds effectively guide robust policy selection, significantly improving performance guarantees when the true or deployed initial distribution deviates from the nominal one.
๐ Abstract
Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.