🤖 AI Summary
Existing variational belief inference methods for high-dimensional, partially observable Markov decision processes (POMDPs) suffer from poor scalability and rely on ground-truth state labels and precise system dynamics models. To address these limitations, we propose the Deep Belief Markov Model (DBMM), the first framework to integrate deep Markov models into the POMDP paradigm. DBMM enables end-to-end, model-free variational belief inference directly from raw observation sequences. It unifies treatment of discrete and continuous latent variables, captures high-dimensional nonlinear dynamics, and incorporates Bayesian filtering principles, recurrent neural architectures, and online parameter adaptation. Evaluated across multiple standard POMDP benchmarks, DBMM achieves significant improvements in belief estimation efficiency and cross-task generalization—without requiring state supervision or explicit system modeling.
📝 Abstract
This work introduces a novel deep learning-based architecture, termed the Deep Belief Markov Model (DBMM), which provides efficient, model-formulation agnostic inference in Partially Observable Markov Decision Process (POMDP) problems. The POMDP framework allows for modeling and solving sequential decision-making problems under observation uncertainty. In complex, high-dimensional, partially observable environments, existing methods for inference based on exact computations (e.g., via Bayes' theorem) or sampling algorithms do not scale well. Furthermore, ground truth states may not be available for learning the exact transition dynamics. DBMMs extend deep Markov models into the partially observable decision-making framework and allow efficient belief inference entirely based on available observation data via variational inference methods. By leveraging the potency of neural networks, DBMMs can infer and simulate non-linear relationships in the system dynamics and naturally scale to problems with high dimensionality and discrete or continuous variables. In addition, neural network parameters can be dynamically updated efficiently based on data availability. DBMMs can thus be used to infer a belief variable, thus enabling the derivation of POMDP solutions over the belief space. We evaluate the efficacy of the proposed methodology by evaluating the capability of model-formulation agnostic inference of DBMMs in benchmark problems that include discrete and continuous variables.