🤖 AI Summary
To address the limited receptive field and weak global modeling capability of point-based models in single-stage 3D point cloud detection, this paper proposes a Point Dilation Mechanism (PDM). PDM explicitly expands the point cloud feature space via Euclidean-space grid dilation and unsupervised feature filling—leveraging spherical harmonic coefficients and Gaussian density functions—to jointly encode directional and scale-aware information. We further introduce a sparse voxel height compression module and a dual-branch detection head (scene heatmap prediction + probability calibration) within a unified optimization framework. Built upon a PointNet-style backbone, our method requires no additional annotations. On the KITTI benchmark, it achieves state-of-the-art single-modality performance, attaining leading mean Average Precision (mAP) across multiple classes while maintaining real-time inference at 68 FPS. Notably, it significantly improves robustness against sparse and occluded objects.
📝 Abstract
Current Point-based detectors can only learn from the provided points, with limited receptive fields and insufficient global learning capabilities for such targets. In this paper, we present a novel Point Dilation Mechanism for single-stage 3D detection (PDM-SSD) that takes advantage of these two representations. Specifically, we first use a PointNet-style 3D backbone for efficient feature encoding. Then, a neck with Point Dilation Mechanism (PDM) is used to expand the feature space, which involves two key steps: point dilation and feature filling. The former expands points to a certain size grid centered around the sampled points in Euclidean space. The latter fills the unoccupied grid with feature for backpropagation using spherical harmonic coefficients and Gaussian density function in terms of direction and scale. Next, we associate multiple dilation centers and fuse coefficients to obtain sparse grid features through height compression. Finally, we design a hybrid detection head for joint learning, where on one hand, the scene heatmap is predicted to complement the voting point set for improved detection accuracy, and on the other hand, the target probability of detected boxes are calibrated through feature fusion. On the challenging Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset, PDM-SSD achieves state-of-the-art results for multi-class detection among single-modal methods with an inference speed of 68 frames. We also demonstrate the advantages of PDM-SSD in detecting sparse and incomplete objects through numerous object-level instances. Additionally, PDM can serve as an auxiliary network to establish a connection between sampling points and object centers, thereby improving the accuracy of the model without sacrificing inference speed. Our code will be available at https://github.com/AlanLiangC/PDM-SSD.git.