Segment Any 3D-Part in a Scene from a Sentence

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the longstanding limitation in 3D scene understanding—its confinement to object-level semantics and the absence of fine-grained, language-driven part-level segmentation. To this end, we propose OpenPart3D, the first open-vocabulary 3D part segmentation framework. Methodologically: (1) we introduce 3D-PU, the first large-scale 3D point cloud dataset with dense, fine-grained part-level annotations; (2) we design a purely 3D multimodal alignment architecture that jointly encodes local and global point cloud features with natural language semantics, eliminating reliance on auxiliary 2D images or predefined category vocabularies; (3) we incorporate synthetic scene augmentation and part-aware contrastive learning to enhance cross-dataset generalization. Extensive evaluations demonstrate substantial improvements over state-of-the-art methods across multiple benchmarks, establishing new SOTA performance in open-vocabulary 3D part segmentation. This work marks the first systematic advancement toward language-guided, fine-grained 3D semantic parsing.

Technology Category

Application Category

📝 Abstract

This paper aims to achieve the segmentation of any 3D part in a scene based on natural language descriptions, extending beyond traditional object-level 3D scene understanding and addressing both data and methodological challenges. Due to the expensive acquisition and annotation burden, existing datasets and methods are predominantly limited to object-level comprehension. To overcome the limitations of data and annotation availability, we introduce the 3D-PU dataset, the first large-scale 3D dataset with dense part annotations, created through an innovative and cost-effective method for constructing synthetic 3D scenes with fine-grained part-level annotations, paving the way for advanced 3D-part scene understanding. On the methodological side, we propose OpenPart3D, a 3D-input-only framework to effectively tackle the challenges of part-level segmentation. Extensive experiments demonstrate the superiority of our approach in open-vocabulary 3D scene understanding tasks at the part level, with strong generalization capabilities across various 3D scene datasets.

Problem

Research questions and friction points this paper is trying to address.

Segment 3D parts using natural language descriptions

Overcome data scarcity for part-level 3D annotation

Develop open-vocabulary part-level 3D understanding framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces 3D-PU dataset with dense part annotations

Proposes OpenPart3D for part-level segmentation

Cost-effective synthetic 3D scene construction method

🔎 Similar Papers

Search3D: Hierarchical Open-Vocabulary 3D Segmentation