🤖 AI Summary
To address challenges in 4D mmWave radar–camera fusion for 3D detection under adverse weather—namely, radar sparsity, semantic poverty, and high computational cost—this paper proposes a geometry-guided progressive two-stage query fusion framework. Our key contributions are: (1) the first wavelet attention module, which compresses redundant radar tensors via wavelet transform while enhancing joint local–global feature representation; (2) a geometry-constrained cross-modal alignment mechanism enabling modality-agnostic, low-distortion fusion between camera images and multi-view radar tensors; and (3) an efficient feature interaction pathway integrating FPN with query-based Transformers. Evaluated on the K-Radar benchmark, our method achieves state-of-the-art overall mAP, improving by 2.4%, and yields a 1.6% gain in sleet-rain scenes—demonstrating significantly enhanced robustness under extreme weather conditions.
📝 Abstract
4D millimeter-wave (mmWave) radar has been widely adopted in autonomous driving and robot perception due to its low cost and all-weather robustness. However, its inherent sparsity and limited semantic richness significantly constrain perception capability. Recently, fusing camera data with 4D radar has emerged as a promising cost effective solution, by exploiting the complementary strengths of the two modalities. Nevertheless, point-cloud-based radar often suffer from information loss introduced by multi-stage signal processing, while directly utilizing raw 4D radar data incurs prohibitive computational costs. To address these challenges, we propose WRCFormer, a novel 3D object detection framework that fuses raw radar cubes with camera inputs via multi-view representations of the decoupled radar cube. Specifically, we design a Wavelet Attention Module as the basic module of wavelet-based Feature Pyramid Network (FPN) to enhance the representation of sparse radar signals and image data. We further introduce a two-stage query-based, modality-agnostic fusion mechanism termed Geometry-guided Progressive Fusion to efficiently integrate multi-view features from both modalities. Extensive experiments demonstrate that WRCFormer achieves state-of-the-art performance on the K-Radar benchmarks, surpassing the best model by approximately 2.4% in all scenarios and 1.6% in the sleet scenario, highlighting its robustness under adverse weather conditions.