🤖 AI Summary
To address the insufficient robustness of autonomous vehicle 3D detection under adverse weather conditions, this paper proposes a complementary BEV-space fusion method integrating 4D millimeter-wave radar and camera modalities. The method comprises four key components: 4D radar signal processing, BEV feature projection, GAN-based depth estimation, and multi-sensor geometric alignment. Its core contributions are: (1) a novel GAN-based paradigm that synthesizes depth maps directly from radar spectrograms—mitigating modality absence in the absence of dedicated depth sensors; and (2) a depth-guided cross-modal attention mechanism that jointly encodes sparse radar point clouds and dense visual semantic features within a unified BEV representation. Evaluated on a real-world automotive dataset, the approach achieves a 12.6% improvement in 3D detection mAP and reduces false positive rates by 37% under adverse weather (rain, snow, fog), significantly enhancing environmental robustness.
📝 Abstract
Safety and reliability are crucial for the public acceptance of autonomous driving. To ensure accurate and reliable environmental perception, intelligent vehicles must exhibit accuracy and robustness in various environments. Millimeter-wave radar, known for its high penetration capability, can operate effectively in adverse weather conditions such as rain, snow, and fog. Traditional 3D millimeter-wave radars can only provide range, Doppler, and azimuth information for objects. Although the recent emergence of 4D millimeter-wave radars has added elevation resolution, the radar point clouds remain sparse due to Constant False Alarm Rate (CFAR) operations. In contrast, cameras offer rich semantic details but are sensitive to lighting and weather conditions. Hence, this paper leverages these two highly complementary and cost-effective sensors, 4D millimeter-wave radar and camera. By integrating 4D radar spectra with depth-aware camera images and employing attention mechanisms, we fuse texture-rich images with depth-rich radar data in the Bird's Eye View (BEV) perspective, enhancing 3D object detection. Additionally, we propose using GAN-based networks to generate depth images from radar spectra in the absence of depth sensors, further improving detection accuracy.