OccLE: Label-Efficient 3D Semantic Occupancy Prediction

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high cost of voxel-level fully supervised annotations and the limited performance of self-supervised methods in 3D semantic occupancy prediction, this paper proposes a sparse supervision learning framework tailored for autonomous driving. Methodologically, it introduces: (1) a novel semantic-geometric decoupled learning paradigm; (2) a Dual Mamba feature fusion mechanism coupled with scattering-accumulated projection supervision to enable cross-modal (image–LiDAR) pseudo-label alignment and semi-supervised geometric enhancement; and (3) knowledge distillation from 2D foundation models integrated with sparse-annotation-guided pseudo-label refinement. Evaluated on the SemanticKITTI validation set, the method achieves 16.59% mIoU using only 10% voxel-level annotations—matching the performance of fully supervised baselines while drastically reducing annotation dependency.

Technology Category

Application Category

📝 Abstract
3D semantic occupancy prediction offers an intuitive and efficient scene understanding and has attracted significant interest in autonomous driving perception. Existing approaches either rely on full supervision, which demands costly voxel-level annotations, or on self-supervision, which provides limited guidance and yields suboptimal performance. To address these challenges, we propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations. Our intuition is to decouple the semantic and geometric learning tasks and then fuse the learned feature grids from both tasks for the final semantic occupancy prediction. Therefore, the semantic branch distills 2D foundation model to provide aligned pseudo labels for 2D and 3D semantic learning. The geometric branch integrates image and LiDAR inputs in cross-plane synergy based on their inherency, employing semi-supervision to enhance geometry learning. We fuse semantic-geometric feature grids through Dual Mamba and incorporate a scatter-accumulated projection to supervise unannotated prediction with aligned pseudo labels. Experiments show that OccLE achieves competitive performance with only 10% of voxel annotations, reaching a mIoU of 16.59% on the SemanticKITTI validation set.
Problem

Research questions and friction points this paper is trying to address.

Reducing costly voxel-level annotations in 3D semantic occupancy prediction
Decoupling semantic and geometric learning for efficient feature fusion
Achieving high performance with limited supervision in autonomous driving perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples semantic and geometric learning tasks
Fuses semantic-geometric grids via Dual Mamba
Uses scatter-accumulated projection for supervision
🔎 Similar Papers
No similar papers found.
N
N. Fang
S-Lab, Nanyang Technological University, Singapore
Zheyuan Zhou
Zheyuan Zhou
Zhejiang University
Fayao Liu
Fayao Liu
Institute for Infocomm Research, A*STAR
Machine LearningComputer Vision
Xulei Yang
Xulei Yang
Principal Scientist & Group Leader, A*STAR, Singapore
3D VisionArtificial IntelligenceMedical Imaging
J
Jiacheng Wei
School of Computer Science and Engineering, Nanyang Technological University, Singapore
L
Lemiao Qiu
School of Mechanical Engineering, Zhejiang University, China
Guosheng Lin
Guosheng Lin
Nanyang Technological University
Computer VisionMachine Learning