🤖 AI Summary
To address the insufficient robustness of LiDAR- and vision-based methods under adverse weather conditions, this paper proposes the first weakly supervised 4D (x,y,z,t) occupancy grid estimation framework leveraging 4D radar. The method introduces a LiDAR-guided pseudo-label generation mechanism—incorporating occupancy queries and height maps—to enable point-cloud-level weak supervision; integrates LiDAR-occupancy alignment distillation and multi-stage pseudo-LiDAR supervision; and employs voxelized 4D radar encoding with a lightweight 3D U-Net architecture. Evaluated under rain, fog, and snow, our approach significantly outperforms state-of-the-art purely visual or LiDAR-based methods. It achieves real-time inference at 30 Hz on an RTX 4060 GPU, demonstrates strong cross-dataset generalization, and consistently improves downstream tasks including BEV semantic segmentation and point-cloud occupancy prediction.
📝 Abstract
A comprehensive understanding of 3D scenes is essential for autonomous vehicles (AVs), and among various perception tasks, occupancy estimation plays a central role by providing a general representation of drivable and occupied space. However, most existing occupancy estimation methods rely on LiDAR or cameras, which perform poorly in degraded environments such as smoke, rain, snow, and fog. In this paper, we propose 4D-ROLLS, the first weakly supervised occupancy estimation method for 4D radar using the LiDAR point cloud as the supervisory signal. Specifically, we introduce a method for generating pseudo-LiDAR labels, including occupancy queries and LiDAR height maps, as multi-stage supervision to train the 4D radar occupancy estimation model. Then the model is aligned with the occupancy map produced by LiDAR, fine-tuning its accuracy in occupancy estimation. Extensive comparative experiments validate the exceptional performance of 4D-ROLLS. Its robustness in degraded environments and effectiveness in cross-dataset training are qualitatively demonstrated. The model is also seamlessly transferred to downstream tasks BEV segmentation and point cloud occupancy prediction, highlighting its potential for broader applications. The lightweight network enables 4D-ROLLS model to achieve fast inference speeds at about 30 Hz on a 4060 GPU. The code of 4D-ROLLS will be made available at https://github.com/CLASS-Lab/4D-ROLLS.