DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the high cost of pixel-level annotations in RGB-D scene parsing, this paper proposes DepthMatch, a semi-supervised learning framework that significantly reduces reliance on labeled data. Methodologically, we introduce a novel complementary block-wise mixing augmentation strategy, design a lightweight spatial prior injector to explicitly model geometric structural constraints, and propose a depth-guided boundary loss to enhance edge localization accuracy and improve multimodal feature fusion efficiency. Evaluated on NYUv2, DepthMatch achieves state-of-the-art performance; it ranks first on the KITTI Semantics benchmark and demonstrates strong generalization across both indoor and outdoor scenes. This work establishes a new paradigm for cost-effective, high-accuracy RGB-D semantic parsing.

Technology Category

Application Category

📝 Abstract

RGB-D scene parsing methods effectively capture both semantic and geometric features of the environment, demonstrating great potential under challenging conditions such as extreme weather and low lighting. However, existing RGB-D scene parsing methods predominantly rely on supervised training strategies, which require a large amount of manually annotated pixel-level labels that are both time-consuming and costly. To overcome these limitations, we introduce DepthMatch, a semi-supervised learning framework that is specifically designed for RGB-D scene parsing. To make full use of unlabeled data, we propose complementary patch mix-up augmentation to explore the latent relationships between texture and spatial features in RGB-D image pairs. We also design a lightweight spatial prior injector to replace traditional complex fusion modules, improving the efficiency of heterogeneous feature fusion. Furthermore, we introduce depth-guided boundary loss to enhance the model's boundary prediction capabilities. Experimental results demonstrate that DepthMatch exhibits high applicability in both indoor and outdoor scenes, achieving state-of-the-art results on the NYUv2 dataset and ranking first on the KITTI Semantics benchmark.

Problem

Research questions and friction points this paper is trying to address.

Semi-supervised RGB-D scene parsing with limited labeled data

Efficient fusion of RGB and depth features

Improved boundary prediction in scene parsing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised RGB-D learning framework

Complementary patch mix-up augmentation

Depth-guided boundary loss enhancement

🔎 Similar Papers

Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance