🤖 AI Summary
This work addresses the fundamental undecidability of reachability probability in partially observable Markov decision processes (POMDPs) by introducing a novel subclass termed posteriorly deterministic POMDPs, for which a formal definition is provided for the first time. Leveraging the structural properties of this model class, the authors develop a method that combines probabilistic inference with formal verification techniques to approximate, with arbitrary precision, the maximal probability of reaching a target state. This subclass subsumes both standard MDPs and nontrivial examples such as the classic Tiger problem, and constitutes the largest known subclass of POMDPs for which reachability probabilities can be effectively approximated, thereby overcoming the inherent undecidability barrier of general POMDPs.
📝 Abstract
Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.