ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

📅 2024-10-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

235K/year
🤖 AI Summary
To address the problem of missed detections in vision-based BEV 3D object detection caused by low appearance contrast between objects and background, this paper proposes a Region-Oriented Attention (ROA) mechanism. ROA is the first approach to incorporate coarse-grained 2D detection priors into BEV feature learning, explicitly guiding the backbone network to focus on potential object regions. It jointly leverages multi-scale features, large-kernel convolutions, and region-adaptive weighting to enhance sensitivity to small objects while expanding receptive field coverage for large objects. Implemented atop the BEVDet/BEVDepth framework, ROA requires no additional annotations or complex architectural modifications. On the nuScenes benchmark, it achieves absolute improvements of +3.2% in mAP and +2.1% in NDS over the respective baselines, significantly outperforming both BEVDet and BEVDepth. These results demonstrate that region-guided attention effectively enhances the robustness and discriminability of BEV representations.

Technology Category

Application Category

📝 Abstract
Vision-based BEV (Bird-Eye-View) 3D object detection has recently become popular in autonomous driving. However, objects with a high similarity to the background from a camera perspective cannot be detected well by existing methods. In this paper, we propose 2D Region-oriented Attention for a BEV-based 3D Object Detection Network (ROA-BEV), which can make the backbone focus more on feature learning in areas where objects may exist. Moreover, our method increases the information content of ROA through a multi-scale structure. In addition, every block of ROA utilizes a large kernel to ensure that the receptive field is large enough to catch large objects' information. Experiments on nuScenes show that ROA-BEV improves the performance based on BEVDet and BEVDepth. The code will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D object detection in autonomous driving
Focuses on object regions using 2D attention
Enhances feature learning with multi-scale structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

2D Region-Oriented Attention for BEV
Multi-scale feature learning enhancement
Large kernel for wide receptive field
🔎 Similar Papers
No similar papers found.
J
Jiwei Chen
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
L
Laiyan Ding
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
C
Chi Zhang
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
F
Feifei Li
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
R
Rui Huang
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China