🤖 AI Summary
This work addresses the insufficient detection accuracy of existing radar-camera fusion methods for small and vulnerable road users (VRUs) under adverse weather conditions, as well as the lack of fine-grained, multi-class evaluation. To overcome these limitations, the authors propose a radar-centric fusion framework that leverages deformable cross-attention to aggregate features extracted by the DINOv3 vision foundation model around transformed reference points in the camera view. This enables full-spectrum fusion between dense FMCW radar tensors and visual semantics, complemented by a cross-modal feature alignment strategy. The method reports, for the first time, individual detection performance across five object classes on the K-Radar dataset, significantly outperforming state-of-the-art approaches under all-weather conditions and achieving a 12.1% improvement in multi-class detection accuracy.
📝 Abstract
Reliable and weather-robust perception systems are essential for safe autonomous driving and typically employ multi-modal sensor configurations to achieve comprehensive environmental awareness. While recent automotive FMCW Radar-based approaches achieved remarkable performance on detection tasks in adverse weather conditions, they exhibited limitations in resolving fine-grained spatial details particularly critical for detecting smaller and vulnerable road users (VRUs). Furthermore, existing research has not adequately addressed VRU detection in adverse weather datasets such as K-Radar. We present DinoRADE, a Radar-centered detection pipeline that processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention. Vision features are provided by a DINOv3 Vision Foundation Model. We present a comprehensive performance evaluation on the K-Radar dataset in all weather conditions and are among the first to report detection performance individually for five object classes. Additionally, we compare our method with existing single-class detection approaches and outperform recent Radar-camera approaches by 12.1%. The code is available under https://github.com/chr-is-tof/RADE-Net.