ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation

📅 2023-11-24
🏛️ arXiv.org
📈 Citations: 5
Influential: 2
📄 PDF
🤖 AI Summary
This work addresses zero-shot 3D point cloud part segmentation without fine-tuning or domain adaptation. Methodologically, it introduces the first cross-modal knowledge transfer framework that synergistically integrates multi-view geometric projection with prompting mechanisms from 2D foundation models (SAM and GLIP): SAM enables interactive 2D segmentation prompts, while GLIP’s open-vocabulary grounding capability guides semantic alignment between 2D regions and 3D parts under co-visibility constraints. Robust 2D→3D knowledge distillation is achieved via multi-view consistency regularization, ensuring strong generalization across simulation-to-real settings and domain shifts. Evaluated on PartNetE and AKBSeg benchmarks, the method significantly outperforms state-of-the-art approaches in three zero-shot tasks—part segmentation, unsupervised segmentation, and instance segmentation—achieving new best results in all.
📝 Abstract
Zero-shot 3D part segmentation is a challenging and fundamental task. In this work, we propose a novel pipeline, ZeroPS, which achieves high-quality knowledge transfer from 2D pretrained foundation models (FMs), SAM and GLIP, to 3D object point clouds. We aim to explore the natural relationship between multi-view correspondence and the FMs' prompt mechanism and build bridges on it. In ZeroPS, the relationship manifests as follows: 1) lifting 2D to 3D by leveraging co-viewed regions and SAM's prompt mechanism, 2) relating 1D classes to 3D parts by leveraging 2D-3D view projection and GLIP's prompt mechanism, and 3) enhancing prediction performance by leveraging multi-view observations. Extensive evaluations on the PartNetE and AKBSeg benchmarks demonstrate that ZeroPS significantly outperforms the SOTA method across zero-shot unlabeled and instance segmentation tasks. ZeroPS does not require additional training or fine-tuning for the FMs. ZeroPS applies to both simulated and real-world data. It is hardly affected by domain shift. The project page is available at https://luis2088.github.io/ZeroPS_page.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot 3D part segmentation
Cross-modal knowledge transfer
2D to 3D lifting using SAM and GLIP
Innovation

Methods, ideas, or system contributions that make the work stand out.

2D to 3D knowledge transfer
Multi-view correspondence exploration
No additional training required
🔎 Similar Papers
No similar papers found.
Y
Yuheng Xue
Nanjing University of Information Science and Technology
Nenglun Chen
Nenglun Chen
Nanjing University of Information Science and Technology
chennenglun at nuist.edu.cn
J
Jun Liu
Singapore University of Technology and Design
W
Wenyun Sun
Nanjing University of Information Science and Technology