π€ AI Summary
This work addresses the limited robustness of existing 3D semantic occupancy prediction methods under adverse weather and lighting conditions, which hinders reliable perception for autonomous driving. To overcome this challenge, we propose the first multimodal framework that fuses 4D radar and camera data, leveraging 4D radarβs high-precision ranging, velocity estimation, and angular resolution in complex environments alongside the rich semantic cues from images. A depth estimation module lifts 2D pixels into 3D space to enhance scene reconstruction. Furthermore, we introduce the first fully automated annotation pipeline for this task, substantially reducing reliance on manual labeling. Experimental results demonstrate that our approach significantly improves both robustness and accuracy across a variety of challenging scenarios.
π Abstract
Autonomous driving requires robust perception across diverse environmental conditions, yet 3D semantic occupancy prediction remains challenging under adverse weather and lighting. In this work, we present the first study combining 4D radar and camera data for 3D semantic occupancy prediction. Our fusion leverages the complementary strengths of both modalities: 4D radar provides reliable range, velocity, and angle measurements in challenging conditions, while cameras contribute rich semantic and texture information. We further show that integrating depth cues from camera pixels enables lifting 2D images to 3D, improving scene reconstruction accuracy. Additionally, we introduce a fully automatically labeled dataset for training semantic occupancy models, substantially reducing reliance on costly manual annotation. Experiments demonstrate the robustness of 4D radar across diverse scenarios, highlighting its potential to advance autonomous vehicle perception.