Boosting Instance Awareness via Cross-View Correlation with 4D Radar and Camera for 3D Object Detection

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges posed by the sparse and weak geometric cues in 4D radar data, which hinder effective instance activation, and the limitations of existing radar-camera fusion methods that suffer from insufficient instance awareness and lack of global context when operating at either the bird’s-eye-view (BEV) or perspective-view level. To overcome these issues, the authors propose SIFormer, a novel framework that first suppresses background noise through segmentation- and depth-guided view transformation, then introduces a cross-view instance activation mechanism to effectively propagate 2D instance cues into BEV space, and finally integrates image semantics and radar geometry via a Transformer-based fusion module. By bridging the complementary strengths of BEV and perspective-view fusion through the first-ever cross-view instance activation in 4D radar-camera perception, SIFormer achieves state-of-the-art performance on View-of-Delft, TJ4DRadSet, and NuScenes, significantly improving 3D object detection accuracy under sparse radar conditions.

Technology Category

Application Category

📝 Abstract
4D millimeter-wave radar has emerged as a promising sensing modality for autonomous driving due to its robustness and affordability. However, its sparse and weak geometric cues make reliable instance activation difficult, limiting the effectiveness of existing radar-camera fusion paradigms. BEV-level fusion offers global scene understanding but suffers from weak instance focus, while perspective-level fusion captures instance details but lacks holistic context. To address these limitations, we propose SIFormer, a scene-instance aware transformer for 3D object detection using 4D radar and camera. SIFormer first suppresses background noise during view transformation through segmentation- and depth-guided localization. It then introduces a cross-view activation mechanism that injects 2D instance cues into BEV space, enabling reliable instance awareness under weak radar geometry. Finally, a transformer-based fusion module aggregates complementary image semantics and radar geometry for robust perception. As a result, with the aim of enhancing instance awareness, SIFormer bridges the gap between the two paradigms, combining their complementary strengths to address inherent sparse nature of radar and improve detection accuracy. Experiments demonstrate that SIFormer achieves state-of-the-art performance on View-of-Delft, TJ4DRadSet and NuScenes datasets. Source code is available at github.com/shawnnnkb/SIFormer.
Problem

Research questions and friction points this paper is trying to address.

4D radar
camera fusion
3D object detection
instance awareness
sparse geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D radar
cross-view correlation
instance awareness
transformer fusion
3D object detection
🔎 Similar Papers
No similar papers found.
Xiaokai Bai
Xiaokai Bai
Zhejiang University Ph.D student
Multimodal Fusion3D object detection4D Radar Perceptionautonomous driving
Lianqing Zheng
Lianqing Zheng
Tongji University Ph.D student
BEV/OCCVLA4D Radar PerceptionMultimodal FusionData Closed-Loop
Si-Yuan Cao
Si-Yuan Cao
Zhejiang University
image alignmenthomography estimationimage fusionplace recognition
Xiaohan Zhang
Xiaohan Zhang
PhD student of Zhejiang University
Computer VisionObject Detection
Z
Zhe Wu
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
B
Beinan Yu
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; and Jinhua Institute of Zhejiang University, Jinhua 321299, China
Fang Wang
Fang Wang
Postdoc, Stanford University
Reading acquisitiondyslexiacross-linguistic researchbilingualismcognitive neuroscience
J
Jie Bai
School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, China; and Hangzhou City University Binjiang Innovation Center, Hangzhou 310052, China
H
Hui-Liang Shen
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China