DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high cost of pixel-level annotations in RGB-D scene parsing, this paper proposes DepthMatch, a semi-supervised learning framework that significantly reduces reliance on labeled data. Methodologically, we introduce a novel complementary block-wise mixing augmentation strategy, design a lightweight spatial prior injector to explicitly model geometric structural constraints, and propose a depth-guided boundary loss to enhance edge localization accuracy and improve multimodal feature fusion efficiency. Evaluated on NYUv2, DepthMatch achieves state-of-the-art performance; it ranks first on the KITTI Semantics benchmark and demonstrates strong generalization across both indoor and outdoor scenes. This work establishes a new paradigm for cost-effective, high-accuracy RGB-D semantic parsing.

Technology Category

Application Category

📝 Abstract
RGB-D scene parsing methods effectively capture both semantic and geometric features of the environment, demonstrating great potential under challenging conditions such as extreme weather and low lighting. However, existing RGB-D scene parsing methods predominantly rely on supervised training strategies, which require a large amount of manually annotated pixel-level labels that are both time-consuming and costly. To overcome these limitations, we introduce DepthMatch, a semi-supervised learning framework that is specifically designed for RGB-D scene parsing. To make full use of unlabeled data, we propose complementary patch mix-up augmentation to explore the latent relationships between texture and spatial features in RGB-D image pairs. We also design a lightweight spatial prior injector to replace traditional complex fusion modules, improving the efficiency of heterogeneous feature fusion. Furthermore, we introduce depth-guided boundary loss to enhance the model's boundary prediction capabilities. Experimental results demonstrate that DepthMatch exhibits high applicability in both indoor and outdoor scenes, achieving state-of-the-art results on the NYUv2 dataset and ranking first on the KITTI Semantics benchmark.
Problem

Research questions and friction points this paper is trying to address.

Semi-supervised RGB-D scene parsing with limited labeled data
Efficient fusion of RGB and depth features
Improved boundary prediction in scene parsing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised RGB-D learning framework
Complementary patch mix-up augmentation
Depth-guided boundary loss enhancement
🔎 Similar Papers
No similar papers found.
Jianxin Huang
Jianxin Huang
Zhejiang University
depth estimationVLMStereo
J
Jiahang Li
College of Electronics & Information Engineering, Shanghai Institute of Intelligent Science and Technology, Shanghai Research Institute for Intelligent Autonomous Systems, the State Key Laboratory of Intelligent Autonomous Systems, and Frontiers Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 201804, China
S
Sergey Vityazev
Ryazan State Radio Engineering University, Ryazan 390035, Russian Federation
A
Alexander Dvorkovich
Multimedia Technology and Telecom Department, Telecommunications Center, Moscow Institute of Physics and Technology, 141701, Institutsky Lane, 9, Dolgoprudny, Moscow Region, Russian Federation
R
Rui Fan
College of Electronics & Information Engineering, Shanghai Institute of Intelligent Science and Technology, Shanghai Research Institute for Intelligent Autonomous Systems, the State Key Laboratory of Intelligent Autonomous Systems, and Frontiers Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 201804, China