SpatioTemporal Difference Network for Video Depth Super-Resolution

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the long-tailed distribution of spatially non-smooth and temporally dynamic regions in video depth super-resolution, this paper proposes the Spatio-Temporal Difference Network (STDNet). STDNet features a dual-branch architecture: a spatial difference branch dynamically aligns RGB and depth features within each frame for intra-frame fine-grained calibration; a temporal difference branch models inter-frame motion dynamics and prioritizes propagation of temporal discrepancy information to mitigate temporal long-tail effects. By jointly optimizing multi-frame RGB and depth data, STDNet achieves enhanced spatial detail reconstruction while preserving temporal consistency. Extensive experiments on multiple benchmark datasets demonstrate that STDNet outperforms state-of-the-art methods in both PSNR and SSIM metrics, with particularly notable improvements in long-tail scenarios—such as object boundaries and motion-prone regions—where spatial fidelity and temporal coherence are most challenging to maintain.

Technology Category

Application Category

📝 Abstract
Depth super-resolution has achieved impressive performance, and the incorporation of multi-frame information further enhances reconstruction quality. Nevertheless, statistical analyses reveal that video depth super-resolution remains affected by pronounced long-tailed distributions, with the long-tailed effects primarily manifesting in spatial non-smooth regions and temporal variation zones. To address these challenges, we propose a novel SpatioTemporal Difference Network (STDNet) comprising two core branches: a spatial difference branch and a temporal difference branch. In the spatial difference branch, we introduce a spatial difference mechanism to mitigate the long-tailed issues in spatial non-smooth regions. This mechanism dynamically aligns RGB features with learned spatial difference representations, enabling intra-frame RGB-D aggregation for depth calibration. In the temporal difference branch, we further design a temporal difference strategy that preferentially propagates temporal variation information from adjacent RGB and depth frames to the current depth frame, leveraging temporal difference representations to achieve precise motion compensation in temporal long-tailed areas. Extensive experimental results across multiple datasets demonstrate the effectiveness of our STDNet, outperforming existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Addresses long-tailed distributions in video depth super-resolution
Improves depth quality in spatial non-smooth regions
Enhances temporal motion compensation in variation zones
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial difference mechanism for non-smooth regions
Temporal difference strategy for motion compensation
RGB-D aggregation for depth calibration
🔎 Similar Papers
No similar papers found.
Zhengxue Wang
Zhengxue Wang
Nanjing University of Science and Technology
Depth/RGB image restoration
Y
Yuan Wu
PCA Lab, Nanjing University of Science and Technology
X
Xiang Li
Nankai University
Zhiqiang Yan
Zhiqiang Yan
National University of Singapore
3D computer visiondepth perceptionoccupancy prediction
J
Jian Yang
PCA Lab, Nanjing University of Science and Technology