MIC-BEV: Multi-Infrastructure Camera Bird's-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in infrastructure-side multi-camera 3D object detection—including multi-view geometric heterogeneity, diverse camera configurations, degraded visual quality, and complex road layouts—this paper proposes a Transformer-based bird’s-eye view (BEV) perception framework. The framework supports flexible integration of heterogeneous cameras and introduces a graph-enhanced fusion module that explicitly models camera-to-BEV-grid geometric relationships while jointly aggregating implicit visual features for relation-aware multi-view feature fusion. It further incorporates deformable attention, graph neural networks, and multimodal fusion. Extensive experiments on the synthetic dataset M2I and the real-world dataset RoScenes demonstrate superior performance. Notably, the method maintains high accuracy under adverse conditions such as extreme weather and sensor degradation, achieving state-of-the-art results on both benchmarks and exhibiting strong potential for deployment in practical intelligent transportation systems.

Technology Category

Application Category

📝 Abstract
Infrastructure-based perception plays a crucial role in intelligent transportation systems, offering global situational awareness and enabling cooperative autonomy. However, existing camera-based detection models often underperform in such scenarios due to challenges such as multi-view infrastructure setup, diverse camera configurations, degraded visual inputs, and various road layouts. We introduce MIC-BEV, a Transformer-based bird's-eye-view (BEV) perception framework for infrastructure-based multi-camera 3D object detection. MIC-BEV flexibly supports a variable number of cameras with heterogeneous intrinsic and extrinsic parameters and demonstrates strong robustness under sensor degradation. The proposed graph-enhanced fusion module in MIC-BEV integrates multi-view image features into the BEV space by exploiting geometric relationships between cameras and BEV cells alongside latent visual cues. To support training and evaluation, we introduce M2I, a synthetic dataset for infrastructure-based object detection, featuring diverse camera configurations, road layouts, and environmental conditions. Extensive experiments on both M2I and the real-world dataset RoScenes demonstrate that MIC-BEV achieves state-of-the-art performance in 3D object detection. It also remains robust under challenging conditions, including extreme weather and sensor degradation. These results highlight the potential of MIC-BEV for real-world deployment. The dataset and source code are available at: https://github.com/HandsomeYun/MIC-BEV.
Problem

Research questions and friction points this paper is trying to address.

Addressing 3D object detection challenges in multi-camera infrastructure setups
Handling diverse camera configurations and degraded visual inputs effectively
Improving robustness under extreme weather and sensor degradation conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based BEV framework for multi-camera 3D detection
Graph-enhanced fusion module using geometric relationships
Robust performance under sensor degradation and weather conditions
🔎 Similar Papers
No similar papers found.
Y
Yun Zhang
University of California, Los Angeles (UCLA), CA 90095, USA
Z
Zhaoliang Zheng
University of California, Los Angeles (UCLA), CA 90095, USA
J
Johnson Liu
University of California, Los Angeles (UCLA), CA 90095, USA
Zhiyu Huang
Zhiyu Huang
Postdoctoral Scholar, University of California, Los Angeles
Machine LearningAutonomous DrivingRoboticsEmbodied AI
Zewei Zhou
Zewei Zhou
University of California, Los Angeles
Deep learningComputer VisionAutonomous DrivingRobotics
Z
Zonglin Meng
University of California, Los Angeles (UCLA), CA 90095, USA
T
Tianhui Cai
University of California, Los Angeles (UCLA), CA 90095, USA
J
Jiaqi Ma
University of California, Los Angeles (UCLA), CA 90095, USA