🤖 AI Summary
To address the lack of prior knowledge in robotic planning under partial observability, this paper proposes a zero-shot scene completion framework leveraging pre-trained generative models. Without fine-tuning or additional training, the method takes partial RGB-D observations as conditional input and drives large-scale generative models to jointly infer geometric occupancy and semantic distributions of unobserved regions, producing spatially coherent and semantically plausible complete 3D point clouds directly usable for configuration-space planning. This work is the first to employ foundation models for zero-shot environmental prior modeling, enabling commonsense reasoning with uncertainty awareness—particularly for occluded areas such as behind doors. Evaluated on the Matterport3D “behind-the-door” navigation benchmark, the generated point clouds exhibit high diversity, structural fidelity, and adherence to physical and semantic constraints, leading to significant improvements in planning success rate and path rationality.
📝 Abstract
Priors are vital for planning under partial observability, yet difficult to obtain in practice. We present a sampling-based pipeline that leverages large-scale pretrained generative models to produce probabilistic priors capturing environmental uncertainty and spatio-semantic relationships in a zero-shot manner. Conditioned on partial observations, the pipeline recovers complete RGB-D point cloud samples with occupancy and target semantics, formulated to be directly useful in configuration-space planning. We establish a Matterport3D benchmark of rooms partially visible through doorways, where a robot must navigate to an unobserved target object. Effective priors for this setting must represent both occupancy and target-location uncertainty in unobserved regions. Experiments show that our approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning, highlight the promise of generative models as a rich source of priors for robotic planning.