🤖 AI Summary
To address the challenges of heavy reliance on scarce 3D annotations and susceptibility to domain shift in 3D LiDAR point cloud semantic segmentation, this paper proposes a purely 2D-driven unsupervised pseudo-label generation framework. Specifically, LiDAR point clouds are rendered into 2D bird’s-eye-view (BEV) and front-view (FV) projections based on sensor intensity; pre-trained 2D semantic segmentation models are then applied to generate initial pseudo-labels. These are refined via differentiable backward projection and cross-view majority voting to yield high-confidence 3D point-level annotations. Crucially, the method eliminates all dependence on 3D ground truth labels and multimodal inputs (e.g., RGB images), establishing the first end-to-end 2D→3D pseudo-labeling pipeline. Extensive experiments across multiple LiDAR datasets demonstrate substantial improvements over state-of-the-art unsupervised domain adaptation baselines. Ablation studies confirm the effectiveness and necessity of each component.
📝 Abstract
Semantic segmentation of 3D LiDAR point clouds, essential for autonomous driving and infrastructure management, is best achieved by supervised learning, which demands extensive annotated datasets and faces the problem of domain shifts. We introduce a new 3D semantic segmentation pipeline that leverages aligned scenes and state-of-the-art 2D segmentation methods, avoiding the need for direct 3D annotation or reliance on additional modalities such as camera images at inference time. Our approach generates 2D views from LiDAR scans colored by sensor intensity and applies 2D semantic segmentation to these views using a camera-domain pretrained model. The segmented 2D outputs are then back-projected onto the 3D points, with a simple voting-based estimator that merges the labels associated to each 3D point. Our main contribution is a global pipeline for 3D semantic segmentation requiring no prior 3D annotation and not other modality for inference, which can be used for pseudo-label generation. We conduct a thorough ablation study and demonstrate the potential of the generated pseudo-labels for the Unsupervised Domain Adaptation task.