GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection

📅 2024-03-18

🏛️ European Conference on Computer Vision

📈 Citations: 8

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address misalignment of BEV features and depth estimation errors caused by LiDAR-camera calibration inaccuracies in multimodal 3D object detection, this paper proposes a dual-module correction framework: Local Align and Global Align. Local Align performs neighborhood-aware, graph-matching-based self-correction of depth estimates, explicitly modeling local geometric mismatches. Global Align mitigates global projection distortions by optimizing cross-modal alignment in the BEV feature space. This work is the first to explicitly model and compensate for geometric mismatches induced by calibration noise within the BEV fusion paradigm. On the nuScenes validation set, our method achieves a 70.1% mAP—outperforming BEV Fusion by 1.6%. Under injected calibration noise, the performance gain widens to 8.3%, demonstrating significantly enhanced robustness. The framework thus bridges a critical gap between theoretical calibration assumptions and practical sensor deployment, improving both accuracy and reliability in real-world multimodal 3D perception.

Technology Category

Application Category

📝 Abstract

Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1%, surpassing BEV Fusion by 1.6% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3% under conditions with misalignment noise.

Problem

Research questions and friction points this paper is trying to address.

Inaccurate calibration between LiDAR and camera sensors.

Misalignment of LiDAR and camera BEV features.

Errors in depth estimation for camera branch.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph BEV framework for robust feature alignment

Local Align module with neighbor-aware depth features

Global Align module to correct LiDAR-camera misalignment

🔎 Similar Papers

ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection