🤖 AI Summary
To address the severe performance degradation of 3D object detection under long-range sparse LiDAR point clouds, this paper proposes the first end-to-end joint scene completion and detection framework. Our method introduces three key innovations: (1) a Transformer decoder-driven cross-network feature bridging module (TransBridge), enabling implicit high-resolution feature sharing between the detection and completion networks; (2) a dynamic-static reconstruction (DSRecon) module that synthesizes dense ground-truth point clouds for supervising scene completion; and (3) an implicit multi-scale feature fusion mechanism requiring no additional computational overhead. Evaluated on nuScenes and Waymo Open Dataset, our approach improves single-stage detectors by 0.7–1.5 mAP points and two-stage detectors by up to 5.78 mAP points. It significantly enhances detection robustness and generalization in long-range sparse regions without increasing inference latency.
📝 Abstract
3D object detection is essential in autonomous driving, providing vital information about moving objects and obstacles. Detecting objects in distant regions with only a few LiDAR points is still a challenge, and numerous strategies have been developed to address point cloud sparsity through densification.This paper presents a joint completion and detection framework that improves the detection feature in sparse areas while maintaining costs unchanged. Specifically, we propose TransBridge, a novel transformer-based up-sampling block that fuses the features from the detection and completion networks.The detection network can benefit from acquiring implicit completion features derived from the completion network. Additionally, we design the Dynamic-Static Reconstruction (DSRecon) module to produce dense LiDAR data for the completion network, meeting the requirement for dense point cloud ground truth.Furthermore, we employ the transformer mechanism to establish connections between channels and spatial relations, resulting in a high-resolution feature map used for completion purposes.Extensive experiments on the nuScenes and Waymo datasets demonstrate the effectiveness of the proposed framework.The results show that our framework consistently improves end-to-end 3D object detection, with the mean average precision (mAP) ranging from 0.7 to 1.5 across multiple methods, indicating its generalization ability. For the two-stage detection framework, it also boosts the mAP up to 5.78 points.