๐ค AI Summary
To address misalignment of BEV features and depth estimation errors caused by LiDAR-camera calibration inaccuracies in multimodal 3D object detection, this paper proposes a dual-module correction framework: Local Align and Global Align. Local Align performs neighborhood-aware, graph-matching-based self-correction of depth estimates, explicitly modeling local geometric mismatches. Global Align mitigates global projection distortions by optimizing cross-modal alignment in the BEV feature space. This work is the first to explicitly model and compensate for geometric mismatches induced by calibration noise within the BEV fusion paradigm. On the nuScenes validation set, our method achieves a 70.1% mAPโoutperforming BEV Fusion by 1.6%. Under injected calibration noise, the performance gain widens to 8.3%, demonstrating significantly enhanced robustness. The framework thus bridges a critical gap between theoretical calibration assumptions and practical sensor deployment, improving both accuracy and reliability in real-world multimodal 3D perception.
๐ Abstract
Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1%, surpassing BEV Fusion by 1.6% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3% under conditions with misalignment noise.