🤖 AI Summary
To address the severe degradation in 3D detection performance caused by single-modality sensor failure (LiDAR or camera) in autonomous driving, this paper proposes ReliFusion—a reliability-driven fusion framework operating in the bird’s-eye-view (BEV) space. Its core contribution is the first explicit quantification of per-modality reliability, integrated with spatio-temporal feature aggregation (STFA) and confidence-weighted mutual cross-attention (CW-MCA) to enable fault-aware, dynamic cross-modal fusion. By adaptively modulating modality weights at the BEV feature level, ReliFusion significantly enhances system robustness under sensor degradation. Evaluated on the nuScenes benchmark, ReliFusion surpasses all state-of-the-art methods, particularly maintaining high accuracy under LiDAR field-of-view occlusion and severe sensor faults. This demonstrates the critical role of explicit reliability modeling in multi-modal fusion for safety-critical perception systems.
📝 Abstract
Accurate and robust 3D object detection is essential for autonomous driving, where fusing data from sensors like LiDAR and camera enhances detection accuracy. However, sensor malfunctions such as corruption or disconnection can degrade performance, and existing fusion models often struggle to maintain reliability when one modality fails. To address this, we propose ReliFusion, a novel LiDAR-camera fusion framework operating in the bird's-eye view (BEV) space. ReliFusion integrates three key components: the Spatio-Temporal Feature Aggregation (STFA) module, which captures dependencies across frames to stabilize predictions over time; the Reliability module, which assigns confidence scores to quantify the dependability of each modality under challenging conditions; and the Confidence-Weighted Mutual Cross-Attention (CW-MCA) module, which dynamically balances information from LiDAR and camera modalities based on these confidence scores. Experiments on the nuScenes dataset show that ReliFusion significantly outperforms state-of-the-art methods, achieving superior robustness and accuracy in scenarios with limited LiDAR fields of view and severe sensor malfunctions.