FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LiDAR-based 3D detection models face deployment challenges on edge devices due to irregular memory access patterns and high computational overhead induced by sparse convolutions and Transformers. Method: We propose ConvDotMix—a hardware-friendly module integrating large-kernel convolutions, Hadamard products, and linear transformations—to enable high-order spatial-embedding joint feature interaction. An implicit tensor grouping mechanism is introduced to improve memory locality. Furthermore, we adopt voxelized 1D serialization to preserve geometric modeling capability while maintaining model lightweightness. Contribution/Results: Our method achieves state-of-the-art accuracy on nuScenes and Waymo Open Dataset. On mobile GPUs/NPUs, it attains 1.6×–9.8× faster inference speed than mainstream approaches, significantly advancing the practical deployment of LiDAR 3D detection on low-power edge platforms.

Technology Category

Application Category

📝 Abstract
Existing LiDAR 3D object detection methods predominantely rely on sparse convolutions and/or transformers, which can be challenging to run on resource-constrained edge devices, due to irregular memory access patterns and high computational costs. In this paper, we propose FALO, a hardware-friendly approach to LiDAR 3D detection, which offers both state-of-the-art (SOTA) detection accuracy and fast inference speed. More specifically, given the 3D point cloud and after voxelization, FALO first arranges sparse 3D voxels into a 1D sequence based on their coordinates and proximity. The sequence is then processed by our proposed ConvDotMix blocks, consisting of large-kernel convolutions, Hadamard products, and linear layers. ConvDotMix provides sufficient mixing capability in both spatial and embedding dimensions, and introduces higher-order nonlinear interaction among spatial features. Furthermore, when going through the ConvDotMix layers, we introduce implicit grouping, which balances the tensor dimensions for more efficient inference and takes into account the growing receptive field. All these operations are friendly to run on resource-constrained platforms and proposed FALO can readily deploy on compact, embedded devices. Our extensive evaluation on LiDAR 3D detection benchmarks such as nuScenes and Waymo shows that FALO achieves competitive performance. Meanwhile, FALO is 1.6~9.8x faster than the latest SOTA on mobile Graphics Processing Unit (GPU) and mobile Neural Processing Unit (NPU).
Problem

Research questions and friction points this paper is trying to address.

Enables fast LiDAR 3D object detection on edge devices
Reduces computational costs while maintaining high accuracy
Optimizes memory access for resource-constrained platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

1D sequence arrangement for sparse 3D voxels
ConvDotMix blocks with large-kernel convolutions
Implicit grouping for efficient inference
🔎 Similar Papers
No similar papers found.