🤖 AI Summary
This work addresses the computational bottleneck in partially observable Markov decision processes (POMDPs) arising from temporal-scale conflicts among goal-reaching, safety assurance, and active information gathering. To resolve this, the authors propose a hierarchical certificate-based control architecture that decouples these three objectives into modular components within belief space. Information gathering is formulated as a convergence problem via a belief control Lyapunov function (BCLF), while finite-horizon probabilistic safety guarantees are enforced through a belief control barrier function (BCBF) grounded in conformal prediction. Real-time control synthesis is achieved via a lightweight quadratic program. Experimental results demonstrate millisecond-level response times in both simulation and a space robotics platform, supporting non-Gaussian belief representations exceeding 10⁴ dimensions. The approach significantly outperforms existing constrained POMDP solvers, achieving notable improvements in both task success rate and safety.
📝 Abstract
Partially Observable Markov Decision Processes (POMDPs) provide a principled framework for robot decision-making under uncertainty. Solving reach-avoid POMDPs, however, requires coordinating three distinct behaviors: goal reaching, safety, and active information gathering to reduce uncertainty. Existing online POMDP solvers attempt to address all three within a single belief tree search, but this unified approach struggles with the conflicting time scales inherent to these objectives. We propose a layered, certificate-based control architecture that operates directly in belief space, decoupling goal reaching, information gathering, and safety into modular components. We introduce Belief Control Lyapunov Functions (BCLFs) that formalize information gathering as a Lyapunov convergence problem in belief space, and show how they can be learned via reinforcement learning. For safety, we develop Belief Control Barrier Functions (BCBFs) that leverage conformal prediction to provide probabilistic safety guarantees over finite horizons. The resulting control synthesis reduces to lightweight quadratic programs solvable in real time, even for non-Gaussian belief representations with dimension $>10^4$. Experiments in simulation and on a space-robotics platform demonstrate real-time performance and improved safety and task success compared to state-of-the-art constrained POMDP solvers.