Enhancing 3D LiDAR Segmentation by Shaping Dense and Accurate 2D Semantic Predictions

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of 3D LiDAR semantic segmentation, where projecting 3D points into 2D intermediate representations often results in sparse and incomplete labels, thereby limiting segmentation performance. To overcome this limitation, the authors propose MM2D3D, a method that reformulates 3D segmentation as a 2D task by leveraging camera images as an auxiliary modality. A cross-modal guided filtering module is introduced to mitigate label sparsity, while a dynamic cross pseudo-supervision mechanism enhances LiDAR-image fusion, yielding dense and accurate 2D semantic predictions. Extensive experiments demonstrate that MM2D3D significantly outperforms existing approaches in both 2D and 3D spaces, achieving superior accuracy and robustness in 3D LiDAR semantic segmentation.

Technology Category

Application Category

📝 Abstract
Semantic segmentation of 3D LiDAR point clouds is important in urban remote sensing for understanding real-world street environments. This task, by projecting LiDAR point clouds and 3D semantic labels as sparse maps, can be reformulated as a 2D problem. However, the intrinsic sparsity of the projected LiDAR and label maps can result in sparse and inaccurate intermediate 2D semantic predictions, which in return limits the final 3D accuracy. To address this issue, we enhance this task by shaping dense and accurate 2D predictions. Specifically, we develop a multi-modal segmentation model, MM2D3D. By leveraging camera images as auxiliary data, we introduce cross-modal guided filtering to overcome label map sparsity by constraining intermediate 2D semantic predictions with dense semantic relations derived from the camera images; and we introduce dynamic cross pseudo supervision to overcome LiDAR map sparsity by encouraging the 2D predictions to emulate the dense distribution of the semantic predictions from the camera images. Experiments show that our techniques enable our model to achieve intermediate 2D semantic predictions with dense distribution and higher accuracy, which effectively enhances the final 3D accuracy. Comparisons with previous methods demonstrate our superior performance in both 2D and 3D spaces.
Problem

Research questions and friction points this paper is trying to address.

3D LiDAR segmentation
semantic segmentation
sparsity
2D projection
point cloud
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal segmentation
cross-modal guided filtering
dynamic cross pseudo supervision
3D LiDAR segmentation
dense semantic prediction
🔎 Similar Papers
No similar papers found.
X
Xiaoyu Dong
The University of Tokyo, Tokyo, Japan; RIKEN AIP, Tokyo, Japan
T
Tiankui Xian
The University of Tokyo, Tokyo, Japan
W
Wanshui Gan
The University of Tokyo, Tokyo, Japan; RIKEN AIP, Tokyo, Japan
Naoto Yokoya
Naoto Yokoya
The University of Tokyo, RIKEN
Remote SensingComputer VisionMachine LearningData Fusion