Robust Parameter Learning for Uncertain MDPs

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the over-conservatism of traditional uncertain MDP approaches, which model state transition probabilities independently and neglect algebraic dependencies induced by shared latent variables. To overcome this limitation, the paper introduces a novel framework based on parametric MDPs (pMDPs) that maps statistical uncertainty from empirical transition frequencies into a parameter space, yielding a PAC confidence model that respects the underlying algebraic structure among transitions. The authors develop an efficient multi-layer polyhedral outer-approximation algorithm to solve the resulting robust planning problem. By uniquely integrating parameterized modeling with statistical uncertainty projection, the method provides formal guarantees while preserving algebraic dependencies, leading to substantially tighter uncertainty representations and, consequently, more performant and practically viable robust policies.
📝 Abstract
Learning-based approaches to verifying unknown Markov decision processes (MDPs) often employ uncertain MDPs. These models use, for example, confidence intervals to capture transition uncertainty and allow synthesis of policies that are robust to this uncertainty. However, this approach typically quantifies uncertainty independently for individual transition probabilities, ignoring dependencies due to shared latent quantities. We propose to learn such models using parametric MDPs (pMDPs), where transition probabilities are expressions over a set of parameters. We project statistical uncertainty from empirical transition frequencies onto the pMDP's parameter space, yielding a probably approximately correct (PAC) uncertainty model for the underlying MDP that respects the algebraic dependencies between transitions. The resulting models are algorithmically challenging to solve, so we propose a hierarchy of sound polytopic outer approximations of the induced confidence set. We implement and evaluate our approach, demonstrating substantially tighter uncertainty estimates than classical interval-based uncertain MDP learning techniques.
Problem

Research questions and friction points this paper is trying to address.

Markov decision processes
uncertainty quantification
parameter dependencies
robust policy synthesis
statistical learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

parametric MDPs
transition dependencies
PAC learning
polytopic approximation
robust policy synthesis