π€ AI Summary
Existing 3D part understanding benchmarks (e.g., PartNet) rely on textureless geometry and labor-intensive expert annotations, limiting scalability and practical applicability. To address this, PartNeXt introduces a next-generation benchmark comprising over 23,000 textured 3D models across 50 categories, annotated with fine-grained, hierarchical semantic labels. It proposes a lightweight, scalable annotation pipeline and pioneers three novel evaluation tasks: texture-aware part labeling, open-vocabulary part localization, and 3D part-level visual question answering (3D-PartVQA). Integrating hierarchical annotations, point cloud segmentation, and 3D large language models, PartNeXt supports category-agnostic part segmentation and multimodal reasoning. Experiments reveal substantial performance degradation of state-of-the-art methods (e.g., PartField, SAMPart3D) on PartNeXt, confirming its increased difficulty. Notably, Point-SAM fine-tuned on PartNeXt surpasses PartNet-based baselines, underscoring the datasetβs value in advancing fine-grained 3D understanding.
π Abstract
Understanding objects at the level of their constituent parts is fundamental to advancing computer vision, graphics, and robotics. While datasets like PartNet have driven progress in 3D part understanding, their reliance on untextured geometries and expert-dependent annotation limits scalability and usability. We introduce PartNeXt, a next-generation dataset addressing these gaps with over 23,000 high-quality, textured 3D models annotated with fine-grained, hierarchical part labels across 50 categories. We benchmark PartNeXt on two tasks: (1) class-agnostic part segmentation, where state-of-the-art methods (e.g., PartField, SAMPart3D) struggle with fine-grained and leaf-level parts, and (2) 3D part-centric question answering, a new benchmark for 3D-LLMs that reveals significant gaps in open-vocabulary part grounding. Additionally, training Point-SAM on PartNeXt yields substantial gains over PartNet, underscoring the dataset's superior quality and diversity. By combining scalable annotation, texture-aware labels, and multi-task evaluation, PartNeXt opens new avenues for research in structured 3D understanding.