OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots

📅 2024-10-14
🏛️ IEEE/RJS International Conference on Intelligent RObots and Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address ambiguous instance localization in open-world 3D semantic mapping caused by feature leakage across voxel boundaries, this paper proposes a zero-shot 3D instance segmentation mapping framework tailored for mobile robots. The method integrates open-vocabulary 2D segmentation with synthetically enhanced depth maps via a novel 3D projection framework; employs point-cloud-driven depth completion and class-agnostic voxelization to ensure geometric fidelity and label-free generalization; and introduces a 3D mask voting mechanism for instance-level semantic fusion without any 3D annotations. Evaluated on ScanNet200 and Replica, the approach achieves state-of-the-art zero-shot 3D instance segmentation performance—outperforming existing methods by significant margins. Real-world deployment demonstrates strong robustness under dynamic environmental conditions and superior cross-scene generalization capability.

Technology Category

Application Category

📝 Abstract
We introduce OV-MAP, a novel approach to open-world 3D mapping for mobile robots by integrating open-features into 3D maps to enhance object recognition capabilities. A significant challenge arises when overlapping features from adjacent voxels reduce instance-level precision, as features spill over voxel boundaries, blending neighboring regions together. Our method overcomes this by employing a class-agnostic segmentation model to project 2D masks into 3D space, combined with a supplemented depth image created by merging raw and synthetic depth from point clouds. This approach, along with a 3D mask voting mechanism, enables accurate zero-shot 3D instance segmentation without relying on 3D supervised segmentation models. We assess the effectiveness of our method through comprehensive experiments on public datasets such as ScanNet200 and Replica, demonstrating superior zero-shot performance, robustness, and adaptability across diverse environments. Additionally, we conducted real-world experiments to demonstrate our method’s adaptability and robustness when applied to diverse real-world environments.
Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D object recognition in open-world robot mapping
Overcoming feature spill-over in voxel-based instance segmentation
Achieving zero-shot 3D segmentation without supervised 3D models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates open-features into 3D maps
Uses class-agnostic 2D-to-3D mask projection
Employs 3D mask voting for zero-shot segmentation
🔎 Similar Papers
No similar papers found.
Juno Kim
Juno Kim
PhD student, UC Berkeley EECS
Y
Yesol Park
Interdisciplinary Program in AI, Seoul National University
H
Hye Jung Yoon
Interdisciplinary Program in AI, Seoul National University
Byoung-Tak Zhang
Byoung-Tak Zhang
Professor of Computer Science, Cognitive Science, and Brain Science, Seoul National University
Machine LearningArtificial IntelligenceCognitive Science