LOD-Net: Locality-Aware 3D Object Detection Using Multi-Scale Transformer Network

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work addresses the degraded performance in detecting small objects and semantically related entities caused by point cloud sparsity and the lack of global structural context. Building upon the 3DETR framework, the authors introduce a multi-scale attention (MSA) mechanism combined with a feature upsampling strategy to effectively integrate local geometric details with global contextual information. This approach not only enhances the model’s local perception capabilities but also reveals that lightweight architectures require carefully matched upsampling strategies to achieve optimal performance. Evaluated on the ScanNetv2 dataset, the proposed method achieves a nearly 1% improvement in mAP@25 and a significant 4.78% gain in mAP@50, substantially outperforming the baseline model.

Technology Category

Application Category

📝 Abstract
3D object detection in point cloud data remains a challenging task due to the sparsity and lack of global structure inherent in the input. In this work, we propose a novel Multi-Scale Attention (MSA) mechanism integrated into the 3DETR architecture to better capture both local geometry and global context. Our method introduces an upsampling operation that generates high-resolution feature maps, enabling the network to better detect smaller and semantically related objects. Experiments conducted on the ScanNetv2 dataset demonstrate that our 3DETR + MSA model improves detection performance, achieving a gain of almost 1% in mAP@25 and 4.78% in mAP@50 over the baseline. While applying MSA to the 3DETR-m variant shows limited improvement, our analysis reveals the importance of adapting the upsampling strategy for lightweight models. These results highlight the effectiveness of combining hierarchical feature extraction with attention mechanisms in enhancing 3D scene understanding.
Problem

Research questions and friction points this paper is trying to address.

3D object detection
point cloud
sparsity
global structure
local geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Scale Attention
3D Object Detection
Point Cloud
3DETR
Upsampling