Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

This work addresses the performance bottleneck in weakly supervised semantic segmentation caused by reliance on sparse annotations. The authors propose a novel feedforward 3D scene reconstruction–assisted supervision framework that leverages geometric structure recovered from 2D video sequences to propagate sparse labels across entire images. For the first time, feedforward 3D reconstruction is integrated with weakly supervised 2D segmentation through a dual student–teacher architecture, enforcing cross-modal semantic consistency between 2D and 3D representations. The method achieves state-of-the-art performance under sparse supervision without requiring additional annotations or incurring extra inference overhead, outperforming existing approaches by 2–7% in segmentation accuracy.

Technology Category

Application Category

📝 Abstract

We present Rewis3d, a framework that leverages recent advances in feed-forward 3D reconstruction to significantly improve weakly supervised semantic segmentation on 2D images. Obtaining dense, pixel-level annotations remains a costly bottleneck for training segmentation models. Alleviating this issue, sparse annotations offer an efficient weakly-supervised alternative. However, they still incur a performance gap. To address this, we introduce a novel approach that leverages 3D scene reconstruction as an auxiliary supervisory signal. Our key insight is that 3D geometric structure recovered from 2D videos provides strong cues that can propagate sparse annotations across entire scenes. Specifically, a dual student-teacher architecture enforces semantic consistency between 2D images and reconstructed 3D point clouds, using state-of-the-art feed-forward reconstruction to generate reliable geometric supervision. Extensive experiments demonstrate that Rewis3d achieves state-of-the-art performance in sparse supervision, outperforming existing approaches by 2-7% without requiring additional labels or inference overhead.

Problem

Research questions and friction points this paper is trying to address.

weakly-supervised semantic segmentation

sparse annotations

3D reconstruction

semantic consistency

pixel-level annotation bottleneck

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D reconstruction

weakly-supervised semantic segmentation

student-teacher architecture