Partially Observable Reference Policy Programming: Solving POMDPs Sans Numerical Optimisation

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This paper addresses the online approximate solution of large-scale, dynamic, partially observable Markov decision processes (POMDPs). Methodologically, it introduces a novel anytime online planning algorithm that integrates deep future history sampling with progressive policy updates, operating within a reference policy programming framework to circumvent explicit numerical optimization. Theoretically, it establishes the first performance loss bound expressed in terms of the *mean* sampling error—rather than the conventional maximum—thereby significantly enhancing robustness and convergence stability under sparse sampling. Empirical evaluation demonstrates that the algorithm substantially outperforms state-of-the-art online POMDP solvers on high-dimensional dynamic tasks, including a challenging Corsican helicopter emergency rescue scenario requiring 150-step planning.

Technology Category

Application Category

📝 Abstract

This paper proposes Partially Observable Reference Policy Programming, a novel anytime online approximate POMDP solver which samples meaningful future histories very deeply while simultaneously forcing a gradual policy update. We provide theoretical guarantees for the algorithm's underlying scheme which say that the performance loss is bounded by the average of the sampling approximation errors rather than the usual maximum, a crucial requirement given the sampling sparsity of online planning. Empirical evaluations on two large-scale problems with dynamically evolving environments -- including a helicopter emergency scenario in the Corsica region requiring approximately 150 planning steps -- corroborate the theoretical results and indicate that our solver considerably outperforms current online benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Solves POMDPs without numerical optimization

Ensures bounded performance loss via sampling errors

Outperforms benchmarks in dynamic large-scale environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anytime online approximate POMDP solver

Deep sampling of future histories

Bounded performance loss guarantee

🔎 Similar Papers

No similar papers found.