EfficientPENet: Real-Time Depth Completion from Sparse LiDAR via Lightweight Multi-Modal Fusion

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
This work addresses the challenge of efficient depth completion from sparse LiDAR point clouds and RGB images by proposing a lightweight dual-branch network. Built upon ConvNeXt as the backbone, the method integrates sparsity-invariant convolutions with a Convolutional Spatial Propagation Network (CSPN), employing late fusion and multi-scale depth supervision. It is the first to introduce layer normalization, large-kernel depthwise convolutions, and stochastic depth regularization into depth completion. Additionally, a position-aware test-time augmentation strategy is devised. On the KITTI benchmark, the model achieves an RMSE of 631.94 mm with only 36.24M parameters and a latency of 20.51 ms (48.76 FPS), reducing the parameter count by 3.7× and accelerating inference by 23× compared to BP-Net.

Technology Category

Application Category

📝 Abstract
Depth completion from sparse LiDAR measurements and corresponding RGB images is a prerequisite for accurate 3D perception in robotic systems. Existing methods achieve high accuracy on standard benchmarks but rely on heavy backbone architectures that preclude real-time deployment on embedded hardware. We present EfficientPENet, a two-branch depth completion network that replaces the conventional ResNet encoder with a modernized ConvNeXt backbone, introduces sparsity-invariant convolutions for the depth stream, and refines predictions through a Convolutional Spatial Propagation Network (CSPN). The RGB branch leverages ImageNet-pretrained ConvNeXt blocks with Layer Normalization, 7x7 depthwise convolutions, and stochastic depth regularization. Features from both branches are merged via late fusion and decoded through a multi-scale deep supervision strategy. We further introduce a position-aware test-time augmentation scheme that corrects coordinate tensors during horizontal flipping, yielding consistent error reduction at inference. On the KITTI depth completion benchmark, EfficientPENet achieves an RMSE of 631.94 mm with 36.24M parameters and a latency of 20.51 ms, operating at 48.76 FPS. This represents a 3.7 times reduction in parameters and a 23 times speedup relative to BP-Net, while maintaining competitive accuracy. These results establish EfficientPENet as a practical solution for real-time depth completion on resource-constrained edge platforms such as the NVIDIA Jetson.
Problem

Research questions and friction points this paper is trying to address.

depth completion
sparse LiDAR
real-time
embedded hardware
multi-modal fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

EfficientPENet
ConvNeXt
sparsity-invariant convolution
Convolutional Spatial Propagation Network
test-time augmentation
🔎 Similar Papers
No similar papers found.
J
Johny J. Lopez
Canizaro Livingston Gulf States Center for Environmental Informatics, University of New Orleans, New Orleans, USA
Md Meftahul Ferdaus
Md Meftahul Ferdaus
University of New Orleans, Postdoctoral Research Scientist
MLOpsLightweight Neural NetworksComputer Vision and RobotsMR Materials
Mahdi Abdelguerfi
Mahdi Abdelguerfi
Professor of Computer Science, University of New Orleans
Geospatial IntelligenceBig DataAI
A
Anton Netchaev
US Army Corps of Engineers, Engineer Research and Development Center, Vicksburg, Mississippi, USA
S
Steven Sloan
US Army Corps of Engineers, Engineer Research and Development Center, Vicksburg, Mississippi, USA
K
Ken Pathak
US Army Corps of Engineers, Engineer Research and Development Center, Vicksburg, Mississippi, USA
K
Kendall N. Niles
US Army Corps of Engineers, Engineer Research and Development Center, Vicksburg, Mississippi, USA