ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

📅 2024-10-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address the problem of missed detections in vision-based BEV 3D object detection caused by low appearance contrast between objects and background, this paper proposes a Region-Oriented Attention (ROA) mechanism. ROA is the first approach to incorporate coarse-grained 2D detection priors into BEV feature learning, explicitly guiding the backbone network to focus on potential object regions. It jointly leverages multi-scale features, large-kernel convolutions, and region-adaptive weighting to enhance sensitivity to small objects while expanding receptive field coverage for large objects. Implemented atop the BEVDet/BEVDepth framework, ROA requires no additional annotations or complex architectural modifications. On the nuScenes benchmark, it achieves absolute improvements of +3.2% in mAP and +2.1% in NDS over the respective baselines, significantly outperforming both BEVDet and BEVDepth. These results demonstrate that region-guided attention effectively enhances the robustness and discriminability of BEV representations.

Technology Category

Application Category

📝 Abstract

Vision-based BEV (Bird-Eye-View) 3D object detection has recently become popular in autonomous driving. However, objects with a high similarity to the background from a camera perspective cannot be detected well by existing methods. In this paper, we propose 2D Region-oriented Attention for a BEV-based 3D Object Detection Network (ROA-BEV), which can make the backbone focus more on feature learning in areas where objects may exist. Moreover, our method increases the information content of ROA through a multi-scale structure. In addition, every block of ROA utilizes a large kernel to ensure that the receptive field is large enough to catch large objects' information. Experiments on nuScenes show that ROA-BEV improves the performance based on BEVDet and BEVDepth. The code will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Improves 3D object detection in autonomous driving

Focuses on object regions using 2D attention

Enhances feature learning with multi-scale structures

Innovation

Methods, ideas, or system contributions that make the work stand out.

2D Region-Oriented Attention for BEV

Multi-scale feature learning enhancement

Large kernel for wide receptive field

🔎 Similar Papers

No similar papers found.