๐ค AI Summary
Manual semantic annotation of 3D point clouds is labor-intensive and costly, while direct adaptation of 2D vision foundation models (VFMs) to 3D often yields inconsistent labels across views or frames. Method: We propose the first fully annotation-free framework for multi-domain-consistent 3D semantic labeling. It leverages 2D VFMs (e.g., SAM and CLIP) to generate initial 2Dโ3D projected labels; introduces a Bayesian voxel fusion mechanism for probabilistic, robust voxel-level label aggregation; and employs a lightweight 3D Consistency Network (3D-CN) to explicitly model spatial neighborhood relationships and optimize cross-domain and cross-frame semantic consistency. Contribution/Results: Our framework produces high-quality 3D semantic labels on diverse point cloud scenes. After fine-tuning downstream segmentation models, it achieves an mIoU of 34.2โsubstantially outperforming both fully supervised and weakly supervised baselines.
๐ Abstract
Availability of datasets is a strong driver for research on 3D semantic understanding, and whilst obtaining unlabeled 3D point cloud data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. Recently, Vision Foundation Models (VFMs) enable open-set semantic segmentation on camera images, potentially aiding automatic labeling. However,VFMs for 3D data have been limited to adaptations of 2D models, which can introduce inconsistencies to 3D labels. This work introduces Label Any Pointcloud (LeAP), leveraging 2D VFMs to automatically label 3D data with any set of classes in any kind of application whilst ensuring label consistency. Using a Bayesian update, point labels are combined into voxels to improve spatio-temporal consistency. A novel 3D Consistency Network (3D-CN) exploits 3D information to further improve label quality. Through various experiments, we show that our method can generate high-quality 3D semantic labels across diverse fields without any manual labeling. Further, models adapted to new domains using our labels show up to a 34.2 mIoU increase in semantic segmentation tasks.