🤖 AI Summary
To address the challenges of detecting distant and occluded objects in outdoor sparse LiDAR point clouds, this paper proposes an efficient multi-frame LiDAR perception framework. The method integrates multi-sweep point cloud accumulation, BEV feature projection, and a Transformer-based architecture. Key contributions include: (1) a lightweight, end-to-end learnable Gumbel Spatial Pruning (GSP) layer that dynamically removes redundant points—enabling plug-and-play deployment with zero computational overhead; (2) the first extension of effective frame fusion from 10 to 40 frames; and (3) joint optimization of temporal aggregation and spatial reasoning. Evaluated on nuScenes, the framework achieves +3.2% mAP in 3D object detection and +2.8% mIoU in BEV map segmentation over the TransL baseline, while maintaining identical inference speed.
📝 Abstract
This paper studies point cloud perception within outdoor environments. Existing methods face limitations in recognizing objects located at a distance or occluded, due to the sparse nature of outdoor point clouds. In this work, we observe a significant mitigation of this problem by accumulating multiple temporally consecutive LiDAR sweeps, resulting in a remarkable improvement in perception accuracy. However, the computation cost also increases, hindering previous approaches from utilizing a large number of LiDAR sweeps. To tackle this challenge, we find that a considerable portion of points in the accumulated point cloud is redundant, and discarding these points has minimal impact on perception accuracy. We introduce a simple yet effective Gumbel Spatial Pruning (GSP) layer that dynamically prunes points based on a learned end-to-end sampling. The GSP layer is decoupled from other network components and thus can be seamlessly integrated into existing point cloud network architectures. Without incurring additional computational overhead, we increase the number of LiDAR sweeps from 10, a common practice, to as many as 40. Consequently, there is a significant enhancement in perception performance. For instance, in nuScenes 3D object detection and BEV map segmentation tasks, our pruning strategy improves the vanilla TransL baseline and other baseline methods.