🤖 AI Summary
To address ambiguous instance localization in open-world 3D semantic mapping caused by feature leakage across voxel boundaries, this paper proposes a zero-shot 3D instance segmentation mapping framework tailored for mobile robots. The method integrates open-vocabulary 2D segmentation with synthetically enhanced depth maps via a novel 3D projection framework; employs point-cloud-driven depth completion and class-agnostic voxelization to ensure geometric fidelity and label-free generalization; and introduces a 3D mask voting mechanism for instance-level semantic fusion without any 3D annotations. Evaluated on ScanNet200 and Replica, the approach achieves state-of-the-art zero-shot 3D instance segmentation performance—outperforming existing methods by significant margins. Real-world deployment demonstrates strong robustness under dynamic environmental conditions and superior cross-scene generalization capability.
📝 Abstract
We introduce OV-MAP, a novel approach to open-world 3D mapping for mobile robots by integrating open-features into 3D maps to enhance object recognition capabilities. A significant challenge arises when overlapping features from adjacent voxels reduce instance-level precision, as features spill over voxel boundaries, blending neighboring regions together. Our method overcomes this by employing a class-agnostic segmentation model to project 2D masks into 3D space, combined with a supplemented depth image created by merging raw and synthetic depth from point clouds. This approach, along with a 3D mask voting mechanism, enables accurate zero-shot 3D instance segmentation without relying on 3D supervised segmentation models. We assess the effectiveness of our method through comprehensive experiments on public datasets such as ScanNet200 and Replica, demonstrating superior zero-shot performance, robustness, and adaptability across diverse environments. Additionally, we conducted real-world experiments to demonstrate our method’s adaptability and robustness when applied to diverse real-world environments.