You Only Click Once: Single Point Weakly Supervised 3D Instance Segmentation for Autonomous Driving

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high annotation cost in LiDAR point cloud 3D instance segmentation for autonomous driving, this paper proposes a single-click weakly supervised paradigm: only one click on the bird’s-eye view (BEV) is required to generate high-quality 3D pseudo-labels. Methodologically, we introduce the first weakly supervised framework integrating vision foundation models with point cloud geometric constraints, incorporating cross-frame temporal consistency modeling, density-aware spatial modeling, and IoU-confidence-guided collaborative pseudo-label refinement. On the Waymo Open Dataset, our method achieves performance comparable to fully supervised Cylinder3D using merely 0.8% of full annotations—significantly outperforming existing weakly supervised approaches and establishing new state-of-the-art (SOTA) results. Our core contributions are threefold: (1) the first formalization of the single-click weak supervision setting for 3D instance segmentation; (2) a novel geometry–semantics joint modeling paradigm; and (3) an efficient, robust pseudo-label generation and optimization mechanism.

Technology Category

Application Category

📝 Abstract
Outdoor LiDAR point cloud 3D instance segmentation is a crucial task in autonomous driving. However, it requires laborious human efforts to annotate the point cloud for training a segmentation model. To address this challenge, we propose a YoCo framework, which generates 3D pseudo labels using minimal coarse click annotations in the bird's eye view plane. It is a significant challenge to produce high-quality pseudo labels from sparse annotations. Our YoCo framework first leverages vision foundation models combined with geometric constraints from point clouds to enhance pseudo label generation. Second, a temporal and spatial-based label updating module is designed to generate reliable updated labels. It leverages predictions from adjacent frames and utilizes the inherent density variation of point clouds (dense near, sparse far). Finally, to further improve label quality, an IoU-guided enhancement module is proposed, replacing pseudo labels with high-confidence and high-IoU predictions. Experiments on the Waymo dataset demonstrate YoCo's effectiveness and generality, achieving state-of-the-art performance among weakly supervised methods and surpassing fully supervised Cylinder3D. Additionally, the YoCo is suitable for various networks, achieving performance comparable to fully supervised methods with minimal fine-tuning using only 0.8% of the fully labeled data, significantly reducing annotation costs.
Problem

Research questions and friction points this paper is trying to address.

Minimizes annotation effort
Enhances 3D pseudo labels
Reduces annotation costs significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimal coarse click annotations
Temporal and spatial label updating
IoU-guided enhancement module
🔎 Similar Papers
No similar papers found.
G
Guangfeng Jiang
University of Science and Technology of China
J
Jun Liu
University of Science and Technology of China
Y
Yongxuan Lv
University of Science and Technology of China
Y
Yuzhi Wu
University of Science and Technology of China
X
Xianfei Li
COWAROBOT
Wenlong Liao
Wenlong Liao
COWAROBOT
RoboticsAI
T
Tao He
COWAROBOT
P
Pai Peng
COWAROBOT