BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In 3D instance segmentation, dense point-level annotations are prohibitively expensive, while bounding-box supervision suffers from ambiguity in overlapping regions, hindering accurate point-to-instance assignment. To address this, we propose the first end-to-end pseudo-mask generation framework. Our method employs a student–teacher co-training paradigm with an instance-center query optimization mechanism and introduces two novel consistency losses—query consistency loss and masked feature consistency loss—combined with exponential moving average parameter updates to achieve precise instance separation under box-level supervision. Unlike prior multi-stage pseudo-labeling approaches, our framework eliminates auxiliary training phases and enables fully end-to-end optimization, significantly improving localization accuracy and convergence speed. Evaluated on ScanNetV2 and S3DIS, it achieves state-of-the-art performance among weakly supervised methods, offering an optimal trade-off between efficiency and minimal annotation cost.

Technology Category

Application Category

📝 Abstract
3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations, resulting in substantial annotation costs and labor overhead. To mitigate this, box-level annotations have been explored as a weaker but more scalable form of supervision. However, box annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging. Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage. However, such two-stage pipelines often increase overall training time and complexity, hinder end-to-end optimization. To overcome these challenges, we propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation. BEEP3D adopts a student-teacher framework, where the teacher model serves as a pseudo-labeler and is updated by the student model via an Exponential Moving Average. To better guide the teacher model to generate precise pseudo-masks, we introduce an instance center-based query refinement that enhances position query localization and leverages features near instance centers. Additionally, we design two novel losses-query consistency loss and masked feature consistency loss-to align semantic and geometric signals between predictions and pseudo-masks. Extensive experiments on ScanNetV2 and S3DIS datasets demonstrate that BEEP3D achieves competitive or superior performance compared to state-of-the-art weakly supervised methods while remaining computationally efficient.
Problem

Research questions and friction points this paper is trying to address.

Reducing annotation costs for 3D instance segmentation
Resolving ambiguity in box-supervised instance assignment
Enabling end-to-end optimization in pseudo-mask generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Student-teacher framework with exponential moving average updates
Instance center-based query refinement for precise localization
Query and masked feature consistency losses for alignment
🔎 Similar Papers
No similar papers found.
Y
Youngju Yoo
School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea 31414
S
Seho Kim
School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea 31414
Changick Kim
Changick Kim
Korea Advanced Institute of Science and Technology
Computer vision