SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mobile dToF sensors suffer from sparse, noisy, and temporally inconsistent depth maps due to hardware constraints and inherent physical imaging limitations. To address this, we propose SVDC, a video-based depth completion method. SVDC introduces an Adaptive Frequency-Selective Fusion (AFSF) module that jointly incorporates Channel-Spatial Enhanced Attention (CSEA) to dynamically modulate convolutional receptive fields. Additionally, a cross-window consistency loss is incorporated to effectively suppress temporal flickering in depth videos. Leveraging multi-frame RGB guidance for dToF data, SVDC jointly optimizes spatiotemporal continuity and edge-detail recovery. Extensive experiments on TartanAir and Dynamic Replica demonstrate that SVDC achieves state-of-the-art performance in depth accuracy, temporal consistency, and edge fidelity—outperforming existing methods across all metrics.

Technology Category

Application Category

📝 Abstract
Lightweight direct Time-of-Flight (dToF) sensors are ideal for 3D sensing on mobile devices. However, due to the manufacturing constraints of compact devices and the inherent physical principles of imaging, dToF depth maps are sparse and noisy. In this paper, we propose a novel video depth completion method, called SVDC, by fusing the sparse dToF data with the corresponding RGB guidance. Our method employs a multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the sparse dToF imaging. Misalignment between consecutive frames during multi-frame fusion could cause blending between object edges and the background, which results in a loss of detail. To address this, we introduce an adaptive frequency selective fusion (AFSF) module, which automatically selects convolution kernel sizes to fuse multi-frame features. Our AFSF utilizes a channel-spatial enhancement attention (CSEA) module to enhance features and generates an attention map as fusion weights. The AFSF ensures edge detail recovery while suppressing high-frequency noise in smooth regions. To further enhance temporal consistency, We propose a cross-window consistency loss to ensure consistent predictions across different windows, effectively reducing flickering. Our proposed SVDC achieves optimal accuracy and consistency on the TartanAir and Dynamic Replica datasets. Code is available at https://github.com/Lan1eve/SVDC.
Problem

Research questions and friction points this paper is trying to address.

Sparse and noisy dToF depth maps in mobile devices
Misalignment and detail loss in multi-frame fusion
Temporal inconsistency and flickering in depth predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses sparse dToF data with RGB guidance
Uses adaptive frequency selective fusion module
Ensures temporal consistency with cross-window loss
🔎 Similar Papers
No similar papers found.
X
Xuan Zhu
Huazhong University of Science and Technology
Jijun Xiang
Jijun Xiang
Huazhong University of Science and Technology
Computer Vision
Xianqi Wang
Xianqi Wang
Huazhong University of Science and Technology
Stereo Matching
Longliang Liu
Longliang Liu
Huazhong University of Science & Technology
optical flowstereo matchingdepth estimation
Y
Yu Wang
Honor Device Co., Ltd
H
Hong Zhang
Honor Device Co., Ltd
F
Fei Guo
Honor Device Co., Ltd
X
Xin Yang
Huazhong University of Science and Technology