OccLE: Label-Efficient 3D Semantic Occupancy Prediction

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the high cost of voxel-level fully supervised annotations and the limited performance of self-supervised methods in 3D semantic occupancy prediction, this paper proposes a sparse supervision learning framework tailored for autonomous driving. Methodologically, it introduces: (1) a novel semantic-geometric decoupled learning paradigm; (2) a Dual Mamba feature fusion mechanism coupled with scattering-accumulated projection supervision to enable cross-modal (image–LiDAR) pseudo-label alignment and semi-supervised geometric enhancement; and (3) knowledge distillation from 2D foundation models integrated with sparse-annotation-guided pseudo-label refinement. Evaluated on the SemanticKITTI validation set, the method achieves 16.59% mIoU using only 10% voxel-level annotations—matching the performance of fully supervised baselines while drastically reducing annotation dependency.

Technology Category

Application Category

📝 Abstract

3D semantic occupancy prediction offers an intuitive and efficient scene understanding and has attracted significant interest in autonomous driving perception. Existing approaches either rely on full supervision, which demands costly voxel-level annotations, or on self-supervision, which provides limited guidance and yields suboptimal performance. To address these challenges, we propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations. Our intuition is to decouple the semantic and geometric learning tasks and then fuse the learned feature grids from both tasks for the final semantic occupancy prediction. Therefore, the semantic branch distills 2D foundation model to provide aligned pseudo labels for 2D and 3D semantic learning. The geometric branch integrates image and LiDAR inputs in cross-plane synergy based on their inherency, employing semi-supervision to enhance geometry learning. We fuse semantic-geometric feature grids through Dual Mamba and incorporate a scatter-accumulated projection to supervise unannotated prediction with aligned pseudo labels. Experiments show that OccLE achieves competitive performance with only 10% of voxel annotations, reaching a mIoU of 16.59% on the SemanticKITTI validation set.

Problem

Research questions and friction points this paper is trying to address.

Reducing costly voxel-level annotations in 3D semantic occupancy prediction

Decoupling semantic and geometric learning for efficient feature fusion

Achieving high performance with limited supervision in autonomous driving perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples semantic and geometric learning tasks

Fuses semantic-geometric grids via Dual Mamba

Uses scatter-accumulated projection for supervision

🔎 Similar Papers

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity