DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the insufficient detection accuracy of existing radar-camera fusion methods for small and vulnerable road users (VRUs) under adverse weather conditions, as well as the lack of fine-grained, multi-class evaluation. To overcome these limitations, the authors propose a radar-centric fusion framework that leverages deformable cross-attention to aggregate features extracted by the DINOv3 vision foundation model around transformed reference points in the camera view. This enables full-spectrum fusion between dense FMCW radar tensors and visual semantics, complemented by a cross-modal feature alignment strategy. The method reports, for the first time, individual detection performance across five object classes on the K-Radar dataset, significantly outperforming state-of-the-art approaches under all-weather conditions and achieving a 12.1% improvement in multi-class detection accuracy.

Technology Category

Application Category

📝 Abstract

Reliable and weather-robust perception systems are essential for safe autonomous driving and typically employ multi-modal sensor configurations to achieve comprehensive environmental awareness. While recent automotive FMCW Radar-based approaches achieved remarkable performance on detection tasks in adverse weather conditions, they exhibited limitations in resolving fine-grained spatial details particularly critical for detecting smaller and vulnerable road users (VRUs). Furthermore, existing research has not adequately addressed VRU detection in adverse weather datasets such as K-Radar. We present DinoRADE, a Radar-centered detection pipeline that processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention. Vision features are provided by a DINOv3 Vision Foundation Model. We present a comprehensive performance evaluation on the K-Radar dataset in all weather conditions and are among the first to report detection performance individually for five object classes. Additionally, we compare our method with existing single-class detection approaches and outperform recent Radar-camera approaches by 12.1%. The code is available under https://github.com/chr-is-tof/RADE-Net.

Problem

Research questions and friction points this paper is trying to address.

adverse weather

multi-class object detection

vulnerable road users

radar-camera fusion

perception robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Radar-Camera Fusion

Vision Foundation Model

Deformable Cross-Attention