PAN: Pillars-Attention-Based Network for 3D Object Detection

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing camera-radar fusion methods for 3D object detection fail to fully exploit the radar point cloud’s advantages in range and radial velocity estimation, while suffering from high computational overhead and poor real-time performance due to complex multimodal fusion architectures. This paper proposes a lightweight camera-radar fusion network leveraging柱ar attention mechanisms for real-time bird’s-eye view (BEV) detection. First, a radar pillar feature embedding module explicitly encodes range and radial velocity information. Second, an intra-pillar self-attention mechanism models geometric and kinematic dependencies among points within each pillar. Third, a simplified convolutional fusion module replaces the conventional Feature Pyramid Network (FPN) to reduce feature aggregation complexity. Evaluated on nuScenes, our method achieves a new state-of-the-art 58.2 NDS with 42 FPS inference speed—the fastest among comparable approaches—while significantly improving robustness and efficiency under adverse environmental conditions.

Technology Category

Application Category

📝 Abstract
Camera-radar fusion offers a robust and low-cost alternative to Camera-lidar fusion for the 3D object detection task in real-time under adverse weather and lighting conditions. However, currently, in the literature, it is possible to find few works focusing on this modality and, most importantly, developing new architectures to explore the advantages of the radar point cloud, such as accurate distance estimation and speed information. Therefore, this work presents a novel and efficient 3D object detection algorithm using cameras and radars in the bird's-eye-view (BEV). Our algorithm exploits the advantages of radar before fusing the features into a detection head. A new backbone is introduced, which maps the radar pillar features into an embedded dimension. A self-attention mechanism allows the backbone to model the dependencies between the radar points. We are using a simplified convolutional layer to replace the FPN-based convolutional layers used in the PointPillars-based architectures with the main goal of reducing inference time. Our results show that with this modification, our approach achieves the new state-of-the-art in the 3D object detection problem, reaching 58.2 of the NDS metric for the use of ResNet-50, while also setting a new benchmark for inference time on the nuScenes dataset for the same category.
Problem

Research questions and friction points this paper is trying to address.

Develops camera-radar fusion for 3D object detection
Exploits radar advantages like distance and speed estimation
Creates efficient architecture to reduce inference time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Camera-radar fusion for 3D object detection
Self-attention mechanism models radar dependencies
Simplified convolutional layer reduces inference time
🔎 Similar Papers
No similar papers found.
R
Ruan Bispo
Department of Electronic and Computer Engineering, Lero (the Research Ireland Centre for Software), and the Data Driven Computer Engineering (D2iCE) Research Centre at the University of Limerick, Limerick, V94 T9PX Ireland.
D
Dane Mitrev
Provizio, Future Mobility Campus Ireland, Shannon Free Zone, V14WV82, Ireland.
L
Letizia Mariotti
Provizio, Future Mobility Campus Ireland, Shannon Free Zone, V14WV82, Ireland.
C
Clément Botty
Provizio, Future Mobility Campus Ireland, Shannon Free Zone, V14WV82, Ireland.
D
Denver Humphrey
Provizio, Future Mobility Campus Ireland, Shannon Free Zone, V14WV82, Ireland.
A
Anthony Scanlan
Department of Electronic and Computer Engineering, Lero (the Research Ireland Centre for Software), and the Data Driven Computer Engineering (D2iCE) Research Centre at the University of Limerick, Limerick, V94 T9PX Ireland.
Ciarán Eising
Ciarán Eising
University of Limerick
computer vision