Initial Distribution Sensitivity of Constrained Markov Decision Processes

๐Ÿ“… 2025-09-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the sensitivity of optimal value functions and policy regret in constrained Markov decision processes (CMDPs) to perturbations in the initial state distributionโ€”a critical robustness concern. To address this, we propose a novel theoretical framework unifying Lagrangian duality with linear programming perturbation analysis. Our main contribution is the first derivation of tight upper and lower bounds on the optimal value as a function of initial distribution shifts. These bounds explicitly characterize the influence of constraint violation magnitude, discount factor, and transition structure on distributional sensitivity. Leveraging these bounds, we develop a computationally tractable method for quantifying policy regret under distributional shift. Empirical evaluation demonstrates that our bounds effectively guide robust policy selection, significantly improving performance guarantees when the true or deployed initial distribution deviates from the nominal one.

Technology Category

Application Category

๐Ÿ“ Abstract
Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.
Problem

Research questions and friction points this paper is trying to address.

Analyzing sensitivity of CMDP optimal value to initial distributions
Deriving variation bounds using duality and perturbation analysis
Evaluating policy regret from unknown initial distribution changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes CMDP sensitivity to initial distributions
Uses duality and perturbation analysis techniques
Derives bounds for policy regret evaluation
๐Ÿ”Ž Similar Papers