DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient detection accuracy of existing radar-camera fusion methods for small and vulnerable road users (VRUs) under adverse weather conditions, as well as the lack of fine-grained, multi-class evaluation. To overcome these limitations, the authors propose a radar-centric fusion framework that leverages deformable cross-attention to aggregate features extracted by the DINOv3 vision foundation model around transformed reference points in the camera view. This enables full-spectrum fusion between dense FMCW radar tensors and visual semantics, complemented by a cross-modal feature alignment strategy. The method reports, for the first time, individual detection performance across five object classes on the K-Radar dataset, significantly outperforming state-of-the-art approaches under all-weather conditions and achieving a 12.1% improvement in multi-class detection accuracy.
📝 Abstract
Reliable and weather-robust perception systems are essential for safe autonomous driving and typically employ multi-modal sensor configurations to achieve comprehensive environmental awareness. While recent automotive FMCW Radar-based approaches achieved remarkable performance on detection tasks in adverse weather conditions, they exhibited limitations in resolving fine-grained spatial details particularly critical for detecting smaller and vulnerable road users (VRUs). Furthermore, existing research has not adequately addressed VRU detection in adverse weather datasets such as K-Radar. We present DinoRADE, a Radar-centered detection pipeline that processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention. Vision features are provided by a DINOv3 Vision Foundation Model. We present a comprehensive performance evaluation on the K-Radar dataset in all weather conditions and are among the first to report detection performance individually for five object classes. Additionally, we compare our method with existing single-class detection approaches and outperform recent Radar-camera approaches by 12.1%. The code is available under https://github.com/chr-is-tof/RADE-Net.
Problem

Research questions and friction points this paper is trying to address.

adverse weather
multi-class object detection
vulnerable road users
radar-camera fusion
perception robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Radar-Camera Fusion
Vision Foundation Model
Deformable Cross-Attention
Adverse Weather Perception
Multi-class Object Detection
🔎 Similar Papers
No similar papers found.
C
Christof Leitgeb
Infineon Technologies AG, Austria; Graz University of Technology, Austria
T
Thomas Puchleitner
Infineon Technologies AG, Austria
Max Peter Ronecker
Max Peter Ronecker
Technical University of Graz
Autonomous DrivingMachine LearningPerceptionDeep LearningDynamic Occupancy Grids
Daniel Watzenig
Daniel Watzenig
Graz University of Technology