Lazy Heuristic Search for Solving POMDPs with Expensive-to-Compute Belief Transitions

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

In POMDP planning, high computational cost of belief state transitions—e.g., ray casting and collision checking—causes severe planning latency. To address this, we propose *lazy belief transition*, a novel paradigm that decouples search guidance from exact Bayesian belief updates within heuristic search frameworks (RTDP-Bel/LAO*), deferring expensive belief transitions and instead relying on Q-value estimates to direct search. This is the first approach to trigger belief computation on-demand, achieving near-optimal policies while drastically reducing planning overhead. Evaluated on contact-rich grasping, rough-terrain navigation, and 1D LiDAR-based indoor navigation tasks, our method accelerates planning by factors of several to over an order of magnitude. Results demonstrate its efficacy and practicality in perception–action coupled robotic domains where real-time responsiveness and accurate uncertainty propagation are critical.

Technology Category

Application Category

📝 Abstract

Heuristic search solvers like RTDP-Bel and LAO* have proven effective for computing optimal and bounded sub-optimal solutions for Partially Observable Markov Decision Processes (POMDPs), which are typically formulated as belief MDPs. A belief represents a probability distribution over possible system states. Given a parent belief and an action, computing belief state transitions involves Bayesian updates that combine the transition and observation models of the POMDP to determine successor beliefs and their transition probabilities. However, there is a class of problems, specifically in robotics, where computing these transitions can be prohibitively expensive due to costly physics simulations, raycasting, or expensive collision checks required by the underlying transition and observation models, leading to long planning times. To address this challenge, we propose Lazy RTDP-Bel and Lazy LAO*, which defer computing expensive belief state transitions by leveraging Q-value estimation, significantly reducing planning time. We demonstrate the superior performance of the proposed lazy planners in domains such as contact-rich manipulation for pose estimation, outdoor navigation in rough terrain, and indoor navigation with a 1-D LiDAR sensor. Additionally, we discuss practical Q-value estimation techniques for commonly encountered problem classes that our lazy planners can leverage. Our results show that lazy heuristic search methods dramatically improve planning speed by postponing expensive belief transition evaluations while maintaining solution quality.

Problem

Research questions and friction points this paper is trying to address.

Solving POMDPs with costly belief transitions

Reducing planning time via deferred transition computations

Maintaining solution quality in expensive-to-compute scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lazy RTDP-Bel defers expensive belief transitions

Lazy LAO* uses Q-value estimation for efficiency

Reduces planning time while maintaining solution quality

🔎 Similar Papers

No similar papers found.