S2ML: Spatio-Spectral Mutual Learning for Depth Completion

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Raw depth maps from RGB-D cameras often suffer from severe missing values due to low reflectivity, boundary shadows, and other artifacts, significantly degrading downstream task performance. To address this, we propose a novel depth completion framework that jointly exploits spatial and frequency-domain features. Specifically, we are the first to explicitly model the distribution characteristics of raw depth maps in the frequency domain—i.e., amplitude and phase spectra—and design a dedicated spectral fusion module. Within a unified embedding space, our method jointly captures both local and global correlations between spatial and frequency-domain features. Furthermore, we introduce a Transformer-based cross-domain feature extraction mechanism coupled with a progressive mutual learning strategy. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate state-of-the-art performance: our method achieves PSNR gains of +0.828 dB and +0.834 dB over CFormer, respectively, significantly improving completion accuracy and structural fidelity.

Technology Category

Application Category

📝 Abstract
The raw depth images captured by RGB-D cameras using Time-of-Flight (TOF) or structured light often suffer from incomplete depth values due to weak reflections, boundary shadows, and artifacts, which limit their applications in downstream vision tasks. Existing methods address this problem through depth completion in the image domain, but they overlook the physical characteristics of raw depth images. It has been observed that the presence of invalid depth areas alters the frequency distribution pattern. In this work, we propose a Spatio-Spectral Mutual Learning framework (S2ML) to harmonize the advantages of both spatial and frequency domains for depth completion. Specifically, we consider the distinct properties of amplitude and phase spectra and devise a dedicated spectral fusion module. Meanwhile, the local and global correlations between spatial-domain and frequency-domain features are calculated in a unified embedding space. The gradual mutual representation and refinement encourage the network to fully explore complementary physical characteristics and priors for more accurate depth completion. Extensive experiments demonstrate the effectiveness of our proposed S2ML method, outperforming the state-of-the-art method CFormer by 0.828 dB and 0.834 dB on the NYU-Depth V2 and SUN RGB-D datasets, respectively.
Problem

Research questions and friction points this paper is trying to address.

Completing incomplete depth images from RGB-D cameras
Addressing invalid depth areas altering frequency distribution patterns
Harmonizing spatial and frequency domains for depth completion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatio-Spectral Mutual Learning framework for depth completion
Dedicated spectral fusion module for amplitude and phase
Unified embedding space for spatial and frequency correlations
🔎 Similar Papers
No similar papers found.
Z
Zihui Zhao
Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China, 518071
Y
Yifei Zhang
Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China, 518071
Z
Zheng Wang
School of Computer Science, Wuhan University, 430072, China
Y
Yang Li
Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China, 518071
Kui Jiang
Kui Jiang
Harbin Institute of Technology
computer visionimage processingdeep learning
Zihan Geng
Zihan Geng
Tsinghua University-SIGS
Computational ImagingOptical Signal ProcessingLight Field Manipulation
C
Chia-Wen Lin
Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University