🤖 AI Summary
To address computational and memory bottlenecks in deploying point-cloud 3D object detection models on edge devices, this paper proposes StripDet, a lightweight real-time detection framework. Methodologically, StripDet introduces three key innovations: (1) the Strip Attention Block, which decomposes 2D convolutions into asymmetric stripe convolutions to linearly reduce computational complexity; (2) a hardware-friendly multi-scale backbone network integrating stripe convolutions, depthwise separable convolutions, and lightweight attention mechanisms; and (3) an end-to-end multi-scale feature fusion strategy to enhance long-range spatial modeling. Evaluated on the KITTI dataset, StripDet achieves 79.97% mAP with only 0.65M parameters—seven times fewer than PointPillars—while outperforming existing lightweight and knowledge distillation approaches. The framework demonstrates superior efficiency–accuracy trade-offs for resource-constrained edge deployment.
📝 Abstract
The deployment of high-accuracy 3D object detection models from point cloud remains a significant challenge due to their substantial computational and memory requirements. To address this, we introduce StripDet, a novel lightweight framework designed for on-device efficiency. First, we propose the novel Strip Attention Block (SAB), a highly efficient module designed to capture long-range spatial dependencies. By decomposing standard 2D convolutions into asymmetric strip convolutions, SAB efficiently extracts directional features while reducing computational complexity from quadratic to linear. Second, we design a hardware-friendly hierarchical backbone that integrates SAB with depthwise separable convolutions and a simple multiscale fusion strategy, achieving end-to-end efficiency. Extensive experiments on the KITTI dataset validate StripDet's superiority. With only 0.65M parameters, our model achieves a 79.97% mAP for car detection, surpassing the baseline PointPillars with a 7x parameter reduction. Furthermore, StripDet outperforms recent lightweight and knowledge distillation-based methods, achieving a superior accuracy-efficiency trade-off while establishing itself as a practical solution for real-world 3D detection on edge devices.