EFFOcc: Learning Efficient Occupancy Networks from Minimal Labels for Autonomous Driving

📅 2024-06-11

📈 Citations: 4

✨ Influential: 1

career value

211K/year

🤖 AI Summary

Existing 3D occupancy networks (OccNets) suffer from high computational overhead and heavy reliance on dense voxel-level annotations, hindering their deployment in resource-constrained automotive systems. To address these limitations, we propose a lightweight, annotation-efficient occupancy prediction framework: the first fusion-based OccNet built exclusively with 2D operators—integrating a ResNet-18 image backbone and multi-stage occupancy-aware knowledge distillation. Our method eliminates 3D convolutions and voxel Transformers, drastically reducing parameter count and computational complexity. Trained with only 40% sparse voxel supervision, it achieves 28.38 mIoU on Occ3D-nuScenes—94.38% of the fully supervised baseline (30.07 mIoU)—while maintaining just 21.35M parameters. Under full supervision, it attains 51.49 mIoU. To our knowledge, this is the first approach to achieve high-fidelity occupancy prediction under extremely sparse labeling, simultaneously balancing efficiency, accuracy, and practical deployability.

Technology Category

Application Category

📝 Abstract

3D occupancy prediction (3DOcc) is a rapidly rising and challenging perception task in the field of autonomous driving. Existing 3D occupancy networks (OccNets) are both computationally heavy and label-hungry. In terms of model complexity, OccNets are commonly composed of heavy Conv3D modules or transformers at the voxel level. Moreover, OccNets are supervised with expensive large-scale dense voxel labels. Model and data inefficiencies, caused by excessive network parameters and label annotation requirements, severely hinder the onboard deployment of OccNets. This paper proposes an EFFicient Occupancy learning framework, EFFOcc, that targets minimal network complexity and label requirements while achieving state-of-the-art accuracy. We first propose an efficient fusion-based OccNet that only uses simple 2D operators and improves accuracy to the state-of-the-art on three large-scale benchmarks: Occ3D-nuScenes, Occ3D-Waymo, and OpenOccupancy-nuScenes. On the Occ3D-nuScenes benchmark, the fusion-based model with ResNet-18 as the image backbone has 21.35M parameters and achieves 51.49 in terms of mean Intersection over Union (mIoU). Furthermore, we propose a multi-stage occupancy-oriented distillation to efficiently transfer knowledge to vision-only OccNet. Extensive experiments on occupancy benchmarks show state-of-the-art precision for both fusion-based and vision-based OccNets. For the demonstration of learning with limited labels, we achieve 94.38% of the performance (mIoU = 28.38) of a 100% labeled vision OccNet (mIoU = 30.07) using the same OccNet trained with only 40% labeled sequences and distillation from the fusion-based OccNet.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational complexity in 3D occupancy networks

Minimizing label dependency for autonomous driving perception

Improving accuracy with efficient fusion-based OccNet design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient fusion-based OccNet with 2D operators

Multi-stage occupancy-oriented distillation method

Minimal label requirements with high accuracy

🔎 Similar Papers

No similar papers found.