3DCoMPaT++: An improved Large-scale 3D Vision Dataset for Compositional Recognition

📅 2023-10-27
🏛️ arXiv.org
📈 Citations: 13
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of part-level material composition recognition and localization in 3D objects. To this end, we introduce Grounded CoMPaT Recognition (GCR), a novel task requiring joint identification and spatial grounding of material combinations at the object-part level. We propose 3DCoMPaT++, a large-scale multimodal 3D dataset comprising 160 million rendered images, 10 million stylized 3D models, and fine-grained joint annotations of parts, materials, and semantics—covering 293 material composition classes. Methodologically, we design a Blender-based controllable rendering pipeline, an enhanced PointNet++ adapted for 6D point clouds, and incorporate multi-view sampling alongside a material-part decoupled annotation protocol. Our approach achieved first place in the CVPR 2023 Data Challenge, significantly improving both material localization accuracy and composition recognition performance. 3DCoMPaT++ has since emerged as a key benchmark for compositional 3D vision research.
📝 Abstract
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.
Problem

Research questions and friction points this paper is trying to address.

Develops a large-scale multimodal 2D/3D dataset for compositional recognition
Introduces Grounded CoMPaT Recognition task for material-part composition analysis
Proposes methods to enhance 3D object part segmentation and classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal 2D/3D dataset with 160M views
Instance-level part segmentation with semantic labels
Modified PointNet++ for 6D input GCR task
🔎 Similar Papers
No similar papers found.
H
Habib Slim
Department of Computer Science, KAUST, Thuwal, Saudi Arabia
X
Xiang Li
Department of Computer Science, KAUST, Thuwal, Saudi Arabia
Y
Yuchen Li
Department of Computer Science, KAUST, Thuwal, Saudi Arabia
M
Mahmoud Ahmed
Department of Computer Science, KAUST, Thuwal, Saudi Arabia
M
Mohamed Ayman
Department of Computer Science, KAUST, Thuwal, Saudi Arabia
U
Ujjwal Upadhyay
Department of Computer Science, KAUST, Thuwal, Saudi Arabia
Ahmed Abdelreheem
Ahmed Abdelreheem
KAUST
3D VisionMachine LearningVision and LanguageDeep Learning
A
Arpita Prajapati
Polynine, San Francisco, California
S
Suhail Pothigara
Polynine, San Francisco, California
Peter Wonka
Peter Wonka
King Abdullah University of Science and Technology (KAUST)
Deep LearningComputer VisionComputer GraphicsMachine LearningRemote Sensing
M
Mohamed Elhoseiny
Department of Computer Science, KAUST, Thuwal, Saudi Arabia