Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This work addresses the challenge of missing observation features with unknown missingness mechanisms in partially observable Markov decision processes (POMDPs) by proposing the missingness-MDP (miss-MDP) framework, which systematically incorporates the classical missingness mechanisms—missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR)—into reinforcement learning. Leveraging structure-aware estimation of the missingness mechanism, the approach constructs a tractable approximate MDP and devises a provably efficient PAC learning algorithm. The method overcomes the fundamental identifiability barrier inherent in general POMDPs under arbitrary missingness, with theoretical guarantees that the learned policy is ε-optimal in the true miss-MDP with high probability. Empirical results demonstrate significant performance gains over existing model-free POMDP methods.
📝 Abstract
We introduce missingness-MDPs (miss-MDPs), a novel subclass of partially observable Markov decision processes (POMDPs) that incorporates the theory of missing data. A miss-MDP is a POMDP whose observation function is a missingness function, specifying the probability that individual state features are missing (i.e., unobserved) at a time step. The literature distinguishes three canonical missingness types: missing (1) completely at random (MCAR), (2) at random (MAR), and (3) not at random (MNAR). Our planning problem is to compute near-optimal policies for a miss-MDP with an unknown missingness function, given a dataset of action-observation trajectories. Achieving such optimality guarantees for policies requires learning the missingness function from data, which is infeasible for general POMDPs. To overcome this challenge, we exploit the structural properties of different missingness types to derive probably approximately correct (PAC) algorithms for learning the missingness function. These algorithms yield an approximate but fully specified miss-MDP that we solve using off-the-shelf planning methods. We prove that, with high probability, the resulting policies are epsilon-optimal in the true miss-MDP. Empirical results confirm the theory and demonstrate superior performance of our approach over two model-free POMDP methods.
Problem

Research questions and friction points this paper is trying to address.

missing data
POMDPs
missingness mechanism
policy optimization
partially observable environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

missingness-MDP
POMDP
missing data mechanism
PAC learning
partially observable planning
🔎 Similar Papers