🤖 AI Summary
This work addresses the lack of provable robustness guarantees for policies in parametric stochastic environments. We propose the first framework integrating scenario optimization with interval Markov decision processes (IMDPs). Under dual uncertainty—unknown parameter distributions and modeling errors in single-environment abstractions—we model the environment as a parametric MDP, learn an IMDP to approximate a collection of environments, and—novelty—incorporate scenario optimization into robustness analysis to jointly quantify both uncertainties. Theoretically, we establish a provably falsifiable probabilistically approximately correct (PAC) joint guarantee on policy performance and risk. Empirically, our method yields tighter and higher-confidence performance bounds across multiple benchmarks, significantly outperforming conventional single-environment IMDP approaches.
📝 Abstract
We present a data-driven approach for producing policies that are provably robust across unknown stochastic environments. Existing approaches can learn models of a single environment as an interval Markov decision processes (IMDP) and produce a robust policy with a probably approximately correct (PAC) guarantee on its performance. However these are unable to reason about the impact of environmental parameters underlying the uncertainty. We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters. We learn and analyse IMDPs for a set of unknown sample environments induced by parameters. The key challenge is then to produce meaningful performance guarantees that combine the two layers of uncertainty: (1) multiple environments induced by parameters with an unknown distribution; (2) unknown induced environments which are approximated by IMDPs. We present a novel approach based on scenario optimisation that yields a single PAC guarantee quantifying the risk level for which a specified performance level can be assured in unseen environments, plus a means to trade-off risk and performance. We implement and evaluate our framework using multiple robust policy generation methods on a range of benchmarks. We show that our approach produces tight bounds on a policy's performance with high confidence.