Self-supervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

📅 2024-06-14

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This paper addresses the degraded robustness of self-supervised monocular depth estimation under adverse weather conditions and imaging noise. We propose a diffusion-model-based robust self-supervised framework. Our key contributions are: (1) a hierarchical feature-guided denoising module that jointly leverages multi-scale visual features to enhance depth perception from blurred or noisy images; and (2) an implicit depth consistency loss that decouples reprojection constraints from inter-frame scale consistency within video sequences, eliminating reliance on ground-truth depth. The method is fully unsupervised and requires only monocular video sequences. Evaluated on KITTI and Make3D, our approach significantly outperforms existing generative methods, achieving simultaneous improvements in both depth accuracy and robustness against blur and noise.

Technology Category

Application Category

📝 Abstract

Self-supervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of generative networks, generative-based methods often exhibit enhanced robustness. In light of this, we employ the generative-based diffusion model with a unique denoising training process for self-supervised monocular depth estimation. Additionally, to further enhance the robustness of the diffusion model, we probe into the influence of perturbations on image features and propose a hierarchical feature-guided denoising module. Furthermore, we explore the implicit depth within reprojection and design an implicit depth consistency loss. This loss function is not interfered by the other subnetwork, which can be targeted to constrain the depth estimation network and ensure the scale consistency of depth within a video sequence. We conduct experiments on the KITTI and Make3D datasets. The results indicate that our approach stands out among generative-based models, while also showcasing remarkable robustness.

Problem

Research questions and friction points this paper is trying to address.

Develop robust self-supervised monocular depth estimation without ground truth

Enhance diffusion model robustness via hierarchical feature-guided denoising

Ensure scale consistency in depth estimation using implicit depth loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical feature-guided denoising module

Implicit depth consistency loss design

Diffusion model for depth estimation

🔎 Similar Papers

Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes