Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework

📅 2024-07-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
To address the disjointed geometric and semantic modeling, severe ground interference, and limited accuracy of lightweight models in LiDAR point cloud-based 3D object detection, this paper proposes a semantic-aware multi-branch detection framework. Our method introduces: (1) a novel Semantic-aware Multi-branch Sampling (SMS) module that jointly integrates random sampling, Density-Equalized Sampling (DES), and Ground-Aware Sampling (GAS); and (2) Consistent Keypoint Selection (CKPS) and Multi-View Fusion Pooling (MVFP) mechanisms, regularized by a multi-view consistency loss to enhance cross-view geometric–semantic alignment. Evaluated on KITTI and Waymo Open Dataset, the framework consistently improves detection performance across diverse backbone architectures—boosting 3D mAP of lightweight models by up to 6.2% while significantly mitigating false positives on road surfaces.

Technology Category

Application Category

📝 Abstract
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view consistency constraints. The SMS module includes random sampling, Density Equalization Sampling (DES) for enhancing distant objects, and Ground Abandonment Sampling (GAS) to focus on non-ground points. The sampled multi-view points are processed through a Consistent KeyPoint Selection (CKPS) module to generate consistent keypoint masks for efficient proposal sampling. The first-stage detector uses multi-branch parallel learning with multi-view consistency loss for feature aggregation, while the second-stage detector fuses multi-view data through a Multi-View Fusion Pooling (MVFP) module to precisely predict 3D objects. The experimental results on the KITTI dataset and Waymo Open Dataset show that our method achieves excellent detection performance improvement for a variety of backbones, especially for low-performance backbones with the simple network structures.
Problem

Research questions and friction points this paper is trying to address.

3D Object Recognition
LiDAR Sensor
Autonomous Vehicles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intelligent Sampling Method
Multi-perspective Consistency Rule
Enhanced 3D Object Detection
🔎 Similar Papers
No similar papers found.
H
Hao Jing
School of Electronic Information Engineering, Taiyuan University of Science and Technology, No. 66 Waliu Road, Taiyuan 030024, China
A
Anhong Wang
School of Electronic Information Engineering, Taiyuan University of Science and Technology, No. 66 Waliu Road, Taiyuan 030024, China
L
Lijun Zhao
School of Electronic Information Engineering, Taiyuan University of Science and Technology, No. 66 Waliu Road, Taiyuan 030024, China
Y
Yakun Yang
School of Electronic Information Engineering, Taiyuan University of Science and Technology, No. 66 Waliu Road, Taiyuan 030024, China
D
Donghan Bu
School of Electronic Information Engineering, Taiyuan University of Science and Technology, No. 66 Waliu Road, Taiyuan 030024, China
J
Jing Zhang
School of Electronic Information Engineering, Taiyuan University of Science and Technology, No. 66 Waliu Road, Taiyuan 030024, China
Y
Yifan Zhang
Junhui Hou
Junhui Hou
Department of Computer Science, City University of Hong Kong
Neural Spatial Computing