Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional LiDAR point cloud 3D detection relies on parallel supervision from both bounding boxes and semantic labels, introducing redundancy and heavy annotation burden. Method: This paper proposes a box-free paradigm that supervises detection solely with semantic segmentation labels. To address low-quality pseudo-boxes arising from geometric incompleteness and ambiguous boundaries in point clouds, we introduce Multi-Frame Multi-Scale Clustering (MFMS-C) to generate high-confidence pseudo-boxes, and Semantic-Guided Iterative Mining Self-Training (SGIM-ST) to refine instance localization. Our approach jointly models spatiotemporal consistency, performs semantic-driven pseudo-label optimization, and employs iterative self-training. Contribution/Results: The method significantly reduces annotation dependency. On Waymo Open Dataset and nuScenes, it achieves absolute mAP improvements of +23.7% and +10.3%, respectively, establishing a novel, label-efficient paradigm for 3D object detection.

Technology Category

Application Category

📝 Abstract
LiDAR-based 3D object detection and semantic segmentation are critical tasks in 3D scene understanding. Traditional detection and segmentation methods supervise their models through bounding box labels and semantic mask labels. However, these two independent labels inherently contain significant redundancy. This paper aims to eliminate the redundancy by supervising 3D object detection using only semantic labels. However, the challenge arises due to the incomplete geometry structure and boundary ambiguity of point-cloud instances, leading to inaccurate pseudo labels and poor detection results. To address these challenges, we propose a novel method, named Seg2Box. We first introduce a Multi-Frame Multi-Scale Clustering (MFMS-C) module, which leverages the spatio-temporal consistency of point clouds to generate accurate box-level pseudo-labels. Additionally, the Semantic?Guiding Iterative-Mining Self-Training (SGIM-ST) module is proposed to enhance the performance by progressively refining the pseudo-labels and mining the instances without generating pseudo-labels. Experiments on the Waymo Open Dataset and nuScenes Dataset show that our method significantly outperforms other competitive methods by 23.7% and 10.3% in mAP, respectively. The results demonstrate the great label-efficient potential and advancement of our method.
Problem

Research questions and friction points this paper is trying to address.

Eliminate redundancy in 3D object detection using semantic labels
Address incomplete geometry and boundary ambiguity in point-cloud instances
Generate accurate pseudo-labels via spatio-temporal consistency and iterative refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic labels for 3D object detection
Multi-Frame Multi-Scale Clustering for pseudo-labels
Semantic-Guiding Iterative-Mining Self-Training refinement
🔎 Similar Papers
No similar papers found.
M
Maoji Zheng
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
Z
Ziyu Xu
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
Q
Qiming Xia
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
Hai Wu
Hai Wu
The University of Hong Kong
Chenglu Wen
Chenglu Wen
Professor of Xiamen University
3D visionpoint cloudsmobile mappingrobotics
C
Cheng Wang
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China