Seeing is Believing: Belief-Space Planning with Foundation Models as Uncertainty Estimators

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing vision-language models (VLMs) assume full observability, rendering them inadequate for reliable reasoning and planning in open-world robotic mobile manipulation tasks characterized by long-horizon dependencies and partial observability. Method: We propose a novel uncertainty-aware framework that repurposes VLMs as uncertainty estimators—enabling symbolic belief representation and confidence modeling over visual-linguistic facts. Grounding logical goals and performing active information-gathering planning are conducted directly within this belief space. Our approach integrates VLMs, symbolic belief representations, a parametric skill library, and logical goal parsing. Results: Evaluated on real-world multi-task scenarios and simulation, our method significantly outperforms end-to-end VLM-based planning and VLM-based state estimation baselines in information acquisition efficacy. It achieves higher task success rates and improved robustness in partially observable environments.

Technology Category

Application Category

📝 Abstract

Generalizable robotic mobile manipulation in open-world environments poses significant challenges due to long horizons, complex goals, and partial observability. A promising approach to address these challenges involves planning with a library of parameterized skills, where a task planner sequences these skills to achieve goals specified in structured languages, such as logical expressions over symbolic facts. While vision-language models (VLMs) can be used to ground these expressions, they often assume full observability, leading to suboptimal behavior when the agent lacks sufficient information to evaluate facts with certainty. This paper introduces a novel framework that leverages VLMs as a perception module to estimate uncertainty and facilitate symbolic grounding. Our approach constructs a symbolic belief representation and uses a belief-space planner to generate uncertainty-aware plans that incorporate strategic information gathering. This enables the agent to effectively reason about partial observability and property uncertainty. We demonstrate our system on a range of challenging real-world tasks that require reasoning in partially observable environments. Simulated evaluations show that our approach outperforms both vanilla VLM-based end-to-end planning or VLM-based state estimation baselines by planning for and executing strategic information gathering. This work highlights the potential of VLMs to construct belief-space symbolic scene representations, enabling downstream tasks such as uncertainty-aware planning.

Problem

Research questions and friction points this paper is trying to address.

Generalizable robotic manipulation in open-world environments

Planning under partial observability and uncertainty

Leveraging VLMs for symbolic grounding and belief-space planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages VLMs as uncertainty estimators

Uses belief-space planner for strategic information

Constructs symbolic belief representation for planning

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey