PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data

๐Ÿ“… 2025-09-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing 3D part segmentation methods rely on 2D foundation models, leading to geometric information loss, inadequate understanding of surface and internal structures, uncontrollable decomposition, and poor open-world generalization. To address these limitations, we propose the first promptable part segmentation model trained directly on large-scale native 3D dataโ€”bypassing multi-view projection entirely. Our approach employs a dual-branch encoder based on triplane representation to jointly encode geometric and topological features, and introduces a promptable segmentation decoder coupled with a model-in-the-loop automatic annotation pipeline for end-to-end part recognition and one-click fully automated decomposition. Evaluated across multiple benchmarks, our method significantly outperforms state-of-the-art approaches: it achieves high single-prompt segmentation accuracy, enables fine-grained structural analysis, and demonstrates strong open-world generalization. This work establishes a new paradigm for 3D understanding and generative modeling.

Technology Category

Application Category

๐Ÿ“ Abstract
Segmenting 3D objects into parts is a long-standing challenge in computer vision. To overcome taxonomy constraints and generalize to unseen 3D objects, recent works turn to open-world part segmentation. These approaches typically transfer supervision from 2D foundation models, such as SAM, by lifting multi-view masks into 3D. However, this indirect paradigm fails to capture intrinsic geometry, leading to surface-only understanding, uncontrolled decomposition, and limited generalization. We present PartSAM, the first promptable part segmentation model trained natively on large-scale 3D data. Following the design philosophy of SAM, PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens for scalable part-aware representation learning. To enable large-scale supervision, we further introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs from online assets, providing diverse and fine-grained labels. This combination of scalable architecture and diverse 3D data yields emergent open-world capabilities: with a single prompt, PartSAM achieves highly accurate part identification, and in a Segment-Every-Part mode, it automatically decomposes shapes into both surface and internal structures. Extensive experiments show that PartSAM outperforms state-of-the-art methods by large margins across multiple benchmarks, marking a decisive step toward foundation models for 3D part understanding. Our code and model will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Develops promptable 3D part segmentation using native 3D data
Overcomes surface-only understanding from 2D-to-3D transfer methods
Enables segmentation of both surface and internal structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Native 3D training on large-scale data
Triplane-based dual-branch encoder architecture
Model-in-the-loop annotation pipeline for labels
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhe Zhu
Nanjing University of Aeronautics and Astronautics
L
Le Wan
Hong Kong University of Science and Technology
R
Rui Xu
The University of Hong Kong
Y
Yiheng Zhang
National University of Singapore
Honghua Chen
Honghua Chen
Research Assistant Professor, Lingnan University, Hong Kong
3D Measurement/Vision3D GenerationDeep Geometry Learning
Z
Zhiyang Dou
The University of Hong Kong
C
Cheng Lin
Macau University of Science and Technology
Y
Yuan Liu
Hong Kong University of Science and Technology
Mingqiang Wei
Mingqiang Wei
Professor at Nanjing University of Aeronautics and Astronautics
3D VisionMultimodal FusionComputer GraphicsDeep Geometry LearningCAD