Partially Observable Monte-Carlo Graph Search

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scalable offline solution of large-scale partially observable Markov decision processes (POMDPs) remains challenging due to prohibitive computational complexity. This paper proposes POMCGS, a scalable offline policy learning algorithm. Its core innovation lies in the first systematic adaptation of the online Monte Carlo tree search (MCTS) paradigm into an offline-constructible and verifiable policy graph structure—achieved via dynamic search-tree folding, progressive action expansion, and observation clustering—to effectively model high-dimensional and partially continuous state spaces. POMCGS enables full policy precomputation, formal analysis, and deterministic deployment, thereby substantially reducing runtime energy consumption and latency. Evaluated on the most challenging standard POMDP benchmarks, POMCGS produces high-quality policies unattainable by prior offline methods, matching the performance of state-of-the-art online solvers. The approach establishes a practical, resource-efficient offline planning paradigm for embedded and latency-sensitive applications.

Technology Category

Application Category

📝 Abstract
Currently, large partially observable Markov decision processes (POMDPs) are often solved by sampling-based online methods which interleave planning and execution phases. However, a pre-computed offline policy is more desirable in POMDP applications with time or energy constraints. But previous offline algorithms are not able to scale up to large POMDPs. In this article, we propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline. Different from many online POMDP methods, which progressively develop a tree while performing (Monte-Carlo) simulations, POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced, and users can analyze and validate the policy prior to embedding and executing it. Moreover, POMCGS, together with action progressive widening and observation clustering methods provided in this article, is able to address certain continuous POMDPs. Through experiments, we demonstrate that POMCGS can generate policies on the most challenging POMDPs, which cannot be computed by previous offline algorithms, and these policies' values are competitive compared with the state-of-the-art online POMDP algorithms.
Problem

Research questions and friction points this paper is trying to address.

Develops offline algorithm for large POMDPs
Reduces computation via dynamic policy graph
Handles continuous POMDPs with new methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline POMDP solution using Monte-Carlo graph search
Folds search tree into policy graph dynamically
Handles continuous POMDPs with widening and clustering