PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

237K/year
🤖 AI Summary
Existing 3D part segmentation methods rely on 2D foundation models, leading to geometric information loss, inadequate understanding of surface and internal structures, uncontrollable decomposition, and poor open-world generalization. To address these limitations, we propose the first promptable part segmentation model trained directly on large-scale native 3D data—bypassing multi-view projection entirely. Our approach employs a dual-branch encoder based on triplane representation to jointly encode geometric and topological features, and introduces a promptable segmentation decoder coupled with a model-in-the-loop automatic annotation pipeline for end-to-end part recognition and one-click fully automated decomposition. Evaluated across multiple benchmarks, our method significantly outperforms state-of-the-art approaches: it achieves high single-prompt segmentation accuracy, enables fine-grained structural analysis, and demonstrates strong open-world generalization. This work establishes a new paradigm for 3D understanding and generative modeling.

Technology Category

Application Category

📝 Abstract
Segmenting 3D objects into parts is a long-standing challenge in computer vision. To overcome taxonomy constraints and generalize to unseen 3D objects, recent works turn to open-world part segmentation. These approaches typically transfer supervision from 2D foundation models, such as SAM, by lifting multi-view masks into 3D. However, this indirect paradigm fails to capture intrinsic geometry, leading to surface-only understanding, uncontrolled decomposition, and limited generalization. We present PartSAM, the first promptable part segmentation model trained natively on large-scale 3D data. Following the design philosophy of SAM, PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens for scalable part-aware representation learning. To enable large-scale supervision, we further introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs from online assets, providing diverse and fine-grained labels. This combination of scalable architecture and diverse 3D data yields emergent open-world capabilities: with a single prompt, PartSAM achieves highly accurate part identification, and in a Segment-Every-Part mode, it automatically decomposes shapes into both surface and internal structures. Extensive experiments show that PartSAM outperforms state-of-the-art methods by large margins across multiple benchmarks, marking a decisive step toward foundation models for 3D part understanding. Our code and model will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Develops promptable 3D part segmentation using native 3D data
Overcomes surface-only understanding from 2D-to-3D transfer methods
Enables segmentation of both surface and internal structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Native 3D training on large-scale data
Triplane-based dual-branch encoder architecture
Model-in-the-loop annotation pipeline for labels
🔎 Similar Papers
No similar papers found.