🤖 AI Summary
This study systematically investigates the impact of weather-induced sensor occlusion on the performance of BEVFusion, a multimodal 3D object detector. Using the nuScenes dataset, we quantify the independent and fused performance of camera and LiDAR modalities in the bird’s-eye view (BEV) space under varying occlusion levels, evaluating with mean Average Precision (mAP) and NuScenes Detection Score (NDS). Results reveal strong LiDAR dependency: severe LiDAR occlusion degrades mAP by 47.3%, whereas moderate camera occlusion reduces mAP by 41.3%. In fusion mode, camera occlusion impact is markedly alleviated (mAP ↓4.1%), yet LiDAR occlusion still incurs a 26.8% mAP drop. This work provides the first empirical characterization of BEVFusion’s modality-specific vulnerability under adverse weather conditions, demonstrating that unimodal robustness is insufficient and underscoring the necessity of occlusion-resilient, adaptive fusion mechanisms. Our findings offer critical evidence for enhancing the reliability of autonomous driving perception systems in real-world, weather-impacted environments.
📝 Abstract
Accurate 3D object detection is essential for automated vehicles to navigate safely in complex real-world environments. Bird's Eye View (BEV) representations, which project multi-sensor data into a top-down spatial format, have emerged as a powerful approach for robust perception. Although BEV-based fusion architectures have demonstrated strong performance through multimodal integration, the effects of sensor occlusions, caused by environmental conditions such as fog, haze, or physical obstructions, on 3D detection accuracy remain underexplored. In this work, we investigate the impact of occlusions on both camera and Light Detection and Ranging (LiDAR) outputs using the BEVFusion architecture, evaluated on the nuScenes dataset. Detection performance is measured using mean Average Precision (mAP) and the nuScenes Detection Score (NDS). Our results show that moderate camera occlusions lead to a 41.3% drop in mAP (from 35.6% to 20.9%) when detection is based only on the camera. On the other hand, LiDAR sharply drops in performance only under heavy occlusion, with mAP falling by 47.3% (from 64.7% to 34.1%), with a severe impact on long-range detection. In fused settings, the effect depends on which sensor is occluded: occluding the camera leads to a minor 4.1% drop (from 68.5% to 65.7%), while occluding LiDAR results in a larger 26.8% drop (to 50.1%), revealing the model's stronger reliance on LiDAR for the task of 3D object detection. Our results highlight the need for future research into occlusion-aware evaluation methods and improved sensor fusion techniques that can maintain detection accuracy in the presence of partial sensor failure or degradation due to adverse environmental conditions.