Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of LiDAR–RGB fusion for 3D object detection in outdoor scenes, this paper proposes an efficient multi-stream fusion network. The method employs three parallel branches to extract features from LiDAR pillar grids, bird’s-eye-view (BEV) representations, and UV-mapped RGB projections, respectively. Crucially, it introduces polar-coordinate indexing—a novel mechanism for cross-modal alignment—enabling joint modeling of geometric structure, texture details, and spatial layout. The architecture integrates LiDAR-PillarNet, height-compressed encoding, UV projection, polar-coordinate feature indexing, and multi-scale feature fusion, coupled with a two-stage detection head to enhance localization accuracy. Evaluated on the KITTI benchmark, the method achieves state-of-the-art (SOTA) or near-SOTA performance across multiple classes, with significant improvements in mean average precision (mAP), while maintaining real-time inference speed. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy. To address real-world challenges in outdoor 3D object detection, fusion of LiDAR and RGB input has started gaining traction. However, effective integration of these modalities for precise object detection task still remains a largely open problem. To address that, we propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities. The network follows a three-stream structure. Its LiDAR-PillarNet stream extracts sparse 2D pillar features from the LiDAR input while the LiDAR-Height Compression stream computes Bird's-Eye View features. An additional 3D Multimodal stream combines RGB and LiDAR features using UV mapping and polar coordinate indexing. Eventually, the features containing comprehensive spatial, textural and geometric information are carefully fused and fed to a detection head for 3D object detection. Our extensive evaluation on the challenging KITTI Object Detection Benchmark using public testing server at https://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=d162ec699d6992040e34314d19ab7f5c217075e0 establishes the efficacy of our method by achieving new state-of-the-art or highly competitive results in different categories while remaining among the most efficient methods. Our code will be released through MuStD GitHub repository at https://github.com/IbrahimUWA/MuStD.git
Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D object detection accuracy using LiDAR and RGB fusion
Integrating LiDAR and RGB modalities effectively for precise detection
Achieving state-of-the-art results in outdoor 3D object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

MultiStream network fuses LiDAR and RGB data
Three-stream structure extracts diverse features
UV mapping combines RGB and LiDAR features
🔎 Similar Papers
No similar papers found.
M
Muhammad Ibrahim
Department of Computer Science, The University of Western Australia
N
Naveed Akhtar
School of Computing & Information Systems, The University of Melbourne
Haitian Wang
Haitian Wang
University of Western Australia
3D point cloudComputer visionMachine leaningIoTRemote sensing
Saeed Anwar
Saeed Anwar
University of Western Australia; Australian National University
Computer Vision3D VisionMachine learningGenerative AI
A
Ajmal Mian
Department of Computer Science, The University of Western Australia