🤖 AI Summary
Existing radar-camera fusion methods struggle to achieve robust 3D object detection under adverse weather conditions due to the scarcity and high annotation cost of real radar data. To address this limitation, this work proposes CLLAP, a novel framework that leverages abundant LiDAR data to synthesize pseudo-radar signals and introduces a two-stage, dual-modality contrastive learning strategy for self-supervised pretraining without requiring real radar annotations. The resulting representation can be seamlessly integrated into existing fusion models as a plug-and-play module, significantly enhancing their feature extraction capability and detection robustness. Extensive experiments on the NuScenes and Lyft Level 5 datasets demonstrate that CLLAP consistently improves the 3D detection performance of three mainstream baseline models, validating its effectiveness and generalizability.
📝 Abstract
Accurate 3D object detection is critical for autonomous driving, necessitating reliable, cost-effective sensors capable of operating in adverse weather conditions. Camera and millimeter-wave radar fusion has emerged as a promising solution; however, these methods often rely on finely annotated radar data, which is scarce and labor-intensive to produce. To address this challenge, we present CLLAP, a Contrastive Learning-based LiDAR-Augmented Pretraining framework that enhances the performance of existing radar-camera fusion methods for 3D object detection. CLLAP leverages abundant LiDAR data to generate pseudo-radar data using the proposed L2R (LiDAR-to-Radar) Sampling method. Then, it incorporates this data into a novel dual-stage, dual-modality contrastive learning strategy, enabling effective self-supervised learning from paired pseudo-radar and image data. This approach facilitates effective pretraining of existing radar-camera fusion models in a plug-and-play manner, enhancing their feature extraction capabilities and improving detection accuracy and robustness. Experimental results using NuScenes and Lyft Level 5 datasets demonstrate significant performance improvements across three baseline models, highlighting CLLAP's effectiveness in advancing radar-camera fusion for autonomous driving applications.