3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low pseudo-label utilization and the discarding of low-confidence yet discriminative information in semi-supervised 3D referring expression segmentation (3D-RES), this paper proposes the first dedicated semi-supervised framework for 3D-RES. Our method jointly leverages 3D point clouds and linguistic features, integrating consistency regularization with dynamic pseudo-label optimization. Key contributions include: (1) Teacher–Student Consistency Sampling (TSCS), which strengthens supervision signals via cross-view and cross-modal consistency; and (2) Quality-Driven Dynamic Weighting (QDW), which adaptively assigns weights to pseudo-labels based on local geometric–semantic confidence, thereby preserving informative yet low-quality predictions. Evaluated under extreme data scarcity (only 1% labeled data), our approach achieves an 8.34-point mIoU improvement over the fully supervised baseline—demonstrating the effectiveness of synergistically exploiting both high- and low-quality pseudo-labels.

Technology Category

Application Category

📝 Abstract
3D Referring Expression Segmentation (3D-RES) typically requires extensive instance-level annotations, which are time-consuming and costly. Semi-supervised learning (SSL) mitigates this by using limited labeled data alongside abundant unlabeled data, improving performance while reducing annotation costs. SSL uses a teacher-student paradigm where teacher generates high-confidence-filtered pseudo-labels to guide student. However, in the context of 3D-RES, where each label corresponds to a single mask and labeled data is scarce, existing SSL methods treat high-quality pseudo-labels merely as auxiliary supervision, which limits the model's learning potential. The reliance on high-confidence thresholds for filtering often results in potentially valuable pseudo-labels being discarded, restricting the model's ability to leverage the abundant unlabeled data. Therefore, we identify two critical challenges in semi-supervised 3D-RES, namely, inefficient utilization of high-quality pseudo-labels and wastage of useful information from low-quality pseudo-labels. In this paper, we introduce the first semi-supervised learning framework for 3D-RES, presenting a robust baseline method named 3DResT. To address these challenges, we propose two novel designs called Teacher-Student Consistency-Based Sampling (TSCS) and Quality-Driven Dynamic Weighting (QDW). TSCS aids in the selection of high-quality pseudo-labels, integrating them into the labeled dataset to strengthen the labeled supervision signals. QDW preserves low-quality pseudo-labels by dynamically assigning them lower weights, allowing for the effective extraction of useful information rather than discarding them. Extensive experiments conducted on the widely used benchmark demonstrate the effectiveness of our method. Notably, with only 1% labeled data, 3DResT achieves an mIoU improvement of 8.34 points compared to the fully supervised method.
Problem

Research questions and friction points this paper is trying to address.

Reducing annotation costs in 3D-RES with semi-supervised learning
Improving pseudo-label utilization in teacher-student SSL frameworks
Enhancing model performance with limited labeled 3D-RES data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Teacher-Student Consistency-Based Sampling for pseudo-labels
Quality-Driven Dynamic Weighting for low-quality labels
Semi-supervised 3D-RES framework with limited labeled data
🔎 Similar Papers
No similar papers found.