š¤ AI Summary
Addressing the dual challenges of sparse annotations and noisy pseudo-labels in 3D point cloud semantic segmentation, this paper proposes a 2D-guided weakly supervised framework. Methodologically, it introduces 2D foundation models (e.g., SAM) for the first time to generate high-quality 2D segmentation masks, which are then geometrically projected onto 3D space via multi-view correspondence; combined with point cloud augmentation, a confidence-driven, uncertainty-aware pseudo-label selection and diffusion mechanism is established through consistency regularization. The core contribution lies in a robust 2Dā3D cross-domain label propagation paradigm that effectively leverages 2D visual priors to compensate for severe 3D annotation scarcity. Extensive experiments on mainstream benchmarks demonstrate substantial improvements over existing weakly supervised approaches, validating that strong 2D segmentation capability meaningfully enhances 3D sparse supervision learning.
š Abstract
Current methods for 3D semantic segmentation propose training models with limited annotations to address the difficulty of annotating large, irregular, and unordered 3D point cloud data. They usually focus on the 3D domain only, without leveraging the complementary nature of 2D and 3D data. Besides, some methods extend original labels or generate pseudo labels to guide the training, but they often fail to fully use these labels or address the noise within them. Meanwhile, the emergence of comprehensive and adaptable foundation models has offered effective solutions for segmenting 2D data. Leveraging this advancement, we present a novel approach that maximizes the utility of sparsely available 3D annotations by incorporating segmentation masks generated by 2D foundation models. We further propagate the 2D segmentation masks into the 3D space by establishing geometric correspondences between 3D scenes and 2D views. We extend the highly sparse annotations to encompass the areas delineated by 3D masks, thereby substantially augmenting the pool of available labels. Furthermore, we apply confidence- and uncertainty-based consistency regularization on augmentations of the 3D point cloud and select the reliable pseudo labels, which are further spread on the 3D masks to generate more labels. This innovative strategy bridges the gap between limited 3D annotations and the powerful capabilities of 2D foundation models, ultimately improving the performance of 3D weakly supervised segmentation.