๐ค AI Summary
Existing 3D part segmentation methods rely on 2D foundation models, leading to geometric information loss, inadequate understanding of surface and internal structures, uncontrollable decomposition, and poor open-world generalization. To address these limitations, we propose the first promptable part segmentation model trained directly on large-scale native 3D dataโbypassing multi-view projection entirely. Our approach employs a dual-branch encoder based on triplane representation to jointly encode geometric and topological features, and introduces a promptable segmentation decoder coupled with a model-in-the-loop automatic annotation pipeline for end-to-end part recognition and one-click fully automated decomposition. Evaluated across multiple benchmarks, our method significantly outperforms state-of-the-art approaches: it achieves high single-prompt segmentation accuracy, enables fine-grained structural analysis, and demonstrates strong open-world generalization. This work establishes a new paradigm for 3D understanding and generative modeling.
๐ Abstract
Segmenting 3D objects into parts is a long-standing challenge in computer vision. To overcome taxonomy constraints and generalize to unseen 3D objects, recent works turn to open-world part segmentation. These approaches typically transfer supervision from 2D foundation models, such as SAM, by lifting multi-view masks into 3D. However, this indirect paradigm fails to capture intrinsic geometry, leading to surface-only understanding, uncontrolled decomposition, and limited generalization. We present PartSAM, the first promptable part segmentation model trained natively on large-scale 3D data. Following the design philosophy of SAM, PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens for scalable part-aware representation learning. To enable large-scale supervision, we further introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs from online assets, providing diverse and fine-grained labels. This combination of scalable architecture and diverse 3D data yields emergent open-world capabilities: with a single prompt, PartSAM achieves highly accurate part identification, and in a Segment-Every-Part mode, it automatically decomposes shapes into both surface and internal structures. Extensive experiments show that PartSAM outperforms state-of-the-art methods by large margins across multiple benchmarks, marking a decisive step toward foundation models for 3D part understanding. Our code and model will be released soon.