Open-Ended 3D Point Cloud Instance Segmentation

📅 2024-08-21

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Current open-vocabulary 3D instance segmentation (OV-3DIS) relies on pre-defined category names at inference time, limiting agent autonomy. This work introduces open-ended 3D instance segmentation (OE-3DIS), the first paradigm enabling autonomous instance segmentation and semantic naming without any prior semantic input. Methodologically, we integrate a 2D multimodal large language model for cross-modal semantic understanding and generation, coupled with point cloud feature disentanglement and mask refinement. We further propose an Open-Ended scoring metric that jointly evaluates semantic plausibility and geometric accuracy. Evaluated on ScanNet200 and ScanNet++, our approach significantly outperforms state-of-the-art methods—including Open3DIS—under zero-shot, label-free conditions, demonstrating high-quality segmentation and self-consistent, semantically coherent naming capabilities.

Technology Category

Application Category

📝 Abstract

Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. Moreover, we contribute a comprehensive set of strong baselines, derived from OV-3DIS approaches and leveraging 2D Multimodal Large Language Models. To assess the performance of our OE-3DIS system, we introduce a novel Open-Ended score, evaluating both the semantic and geometric quality of predicted masks and their associated class names, alongside the standard AP score. Our approach demonstrates significant performance improvements over the baselines on the ScanNet200 and ScanNet++ datasets. Remarkably, our method surpasses the performance of Open3DIS, the current state-of-the-art method in OV-3DIS, even in the absence of ground-truth object class names.

Problem

Research questions and friction points this paper is trying to address.

Eliminates need for predefined class names in 3D segmentation

Proposes Open-Ended 3DIS to enhance agent autonomy

Evaluates semantic and geometric quality of predicted masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-Ended 3DIS without predefined class names

Leveraging 2D Multimodal Large Language Models

Novel Open-Ended score for semantic and geometric evaluation

🔎 Similar Papers

No similar papers found.