Initial Distribution Sensitivity of Constrained Markov Decision Processes

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work investigates the sensitivity of optimal value functions and policy regret in constrained Markov decision processes (CMDPs) to perturbations in the initial state distribution—a critical robustness concern. To address this, we propose a novel theoretical framework unifying Lagrangian duality with linear programming perturbation analysis. Our main contribution is the first derivation of tight upper and lower bounds on the optimal value as a function of initial distribution shifts. These bounds explicitly characterize the influence of constraint violation magnitude, discount factor, and transition structure on distributional sensitivity. Leveraging these bounds, we develop a computationally tractable method for quantifying policy regret under distributional shift. Empirical evaluation demonstrates that our bounds effectively guide robust policy selection, significantly improving performance guarantees when the true or deployed initial distribution deviates from the nominal one.

Technology Category

Application Category

📝 Abstract

Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.

Problem

Research questions and friction points this paper is trying to address.

Analyzing sensitivity of CMDP optimal value to initial distributions

Deriving variation bounds using duality and perturbation analysis

Evaluating policy regret from unknown initial distribution changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes CMDP sensitivity to initial distributions

Uses duality and perturbation analysis techniques

Derives bounds for policy regret evaluation

🔎 Similar Papers

Counterfactual Influence in Markov Decision Processes