Self-supervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

📅 2024-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the degraded robustness of self-supervised monocular depth estimation under adverse weather conditions and imaging noise. We propose a diffusion-model-based robust self-supervised framework. Our key contributions are: (1) a hierarchical feature-guided denoising module that jointly leverages multi-scale visual features to enhance depth perception from blurred or noisy images; and (2) an implicit depth consistency loss that decouples reprojection constraints from inter-frame scale consistency within video sequences, eliminating reliance on ground-truth depth. The method is fully unsupervised and requires only monocular video sequences. Evaluated on KITTI and Make3D, our approach significantly outperforms existing generative methods, achieving simultaneous improvements in both depth accuracy and robustness against blur and noise.

Technology Category

Application Category

📝 Abstract
Self-supervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of generative networks, generative-based methods often exhibit enhanced robustness. In light of this, we employ the generative-based diffusion model with a unique denoising training process for self-supervised monocular depth estimation. Additionally, to further enhance the robustness of the diffusion model, we probe into the influence of perturbations on image features and propose a hierarchical feature-guided denoising module. Furthermore, we explore the implicit depth within reprojection and design an implicit depth consistency loss. This loss function is not interfered by the other subnetwork, which can be targeted to constrain the depth estimation network and ensure the scale consistency of depth within a video sequence. We conduct experiments on the KITTI and Make3D datasets. The results indicate that our approach stands out among generative-based models, while also showcasing remarkable robustness.
Problem

Research questions and friction points this paper is trying to address.

Develop robust self-supervised monocular depth estimation without ground truth
Enhance diffusion model robustness via hierarchical feature-guided denoising
Ensure scale consistency in depth estimation using implicit depth loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical feature-guided denoising module
Implicit depth consistency loss design
Diffusion model for depth estimation
R
Runze Liu
School of Information Science and Technology, ShanghaiTech University; Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
D
Dongchen Zhu
Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
G
Guanghui Zhang
Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
Y
Yue Xu
Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
W
Wenjun Shi
Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
X
Xiaolin Zhang
School of Information Science and Technology, ShanghaiTech University; Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
L
Lei Wang
Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
J
Jiamao Li
Bionic Vision System Laboratory, State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences