Wavelet-based Multi-View Fusion of 4D Radar Tensor and Camera for Robust 3D Object Detection

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address challenges in 4D mmWave radar–camera fusion for 3D detection under adverse weather—namely, radar sparsity, semantic poverty, and high computational cost—this paper proposes a geometry-guided progressive two-stage query fusion framework. Our key contributions are: (1) the first wavelet attention module, which compresses redundant radar tensors via wavelet transform while enhancing joint local–global feature representation; (2) a geometry-constrained cross-modal alignment mechanism enabling modality-agnostic, low-distortion fusion between camera images and multi-view radar tensors; and (3) an efficient feature interaction pathway integrating FPN with query-based Transformers. Evaluated on the K-Radar benchmark, our method achieves state-of-the-art overall mAP, improving by 2.4%, and yields a 1.6% gain in sleet-rain scenes—demonstrating significantly enhanced robustness under extreme weather conditions.

Technology Category

Application Category

📝 Abstract

4D millimeter-wave (mmWave) radar has been widely adopted in autonomous driving and robot perception due to its low cost and all-weather robustness. However, its inherent sparsity and limited semantic richness significantly constrain perception capability. Recently, fusing camera data with 4D radar has emerged as a promising cost effective solution, by exploiting the complementary strengths of the two modalities. Nevertheless, point-cloud-based radar often suffer from information loss introduced by multi-stage signal processing, while directly utilizing raw 4D radar data incurs prohibitive computational costs. To address these challenges, we propose WRCFormer, a novel 3D object detection framework that fuses raw radar cubes with camera inputs via multi-view representations of the decoupled radar cube. Specifically, we design a Wavelet Attention Module as the basic module of wavelet-based Feature Pyramid Network (FPN) to enhance the representation of sparse radar signals and image data. We further introduce a two-stage query-based, modality-agnostic fusion mechanism termed Geometry-guided Progressive Fusion to efficiently integrate multi-view features from both modalities. Extensive experiments demonstrate that WRCFormer achieves state-of-the-art performance on the K-Radar benchmarks, surpassing the best model by approximately 2.4% in all scenarios and 1.6% in the sleet scenario, highlighting its robustness under adverse weather conditions.

Problem

Research questions and friction points this paper is trying to address.

Fuses raw 4D radar and camera data for 3D object detection

Addresses radar sparsity and computational cost of raw data

Enhances robustness in adverse weather conditions like sleet

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses raw radar cubes with camera via multi-view representations

Uses wavelet attention module to enhance sparse signal representation

Employs geometry-guided progressive fusion for efficient multi-modal integration

🔎 Similar Papers

No similar papers found.