Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes

📅 2023-12-23

📈 Citations: 2

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Self-supervised monocular depth estimation suffers from degraded accuracy in dynamic scenes due to the invalidity of the static-world assumption. To address this, we propose a motion-aware pseudo-static reference frame modeling approach: optical flow-guided frame alignment and coarse-depth-assisted synthesis generate a pseudo-static reference frame; a motion-aware cost volume is then constructed, and a multi-scale depth network—integrating channel-wise and non-local attention—is designed to jointly estimate depths for both dynamic objects and static backgrounds. The method achieves significant improvements in dynamic robustness while maintaining computational efficiency: on KITTI-2015, it reduces RMSE by approximately 5% over same-FLOPs baselines. This demonstrates the effectiveness of synergistically optimizing motion modeling and attention mechanisms for dynamic-scene depth estimation.

📝 Abstract

Despite advancements in self-supervised monocular depth estimation, challenges persist in dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present Manydepth2, to achieve precise depth estimation for both dynamic objects and static backgrounds, all while maintaining computational efficiency. To tackle the challenges posed by dynamic content, we incorporate optical flow and coarse monocular depth to create a pseudo-static reference frame. This frame is then utilized to build a motion-aware cost volume in collaboration with the vanilla target frame. Furthermore, to improve the accuracy and robustness of the network architecture, we propose an attention-based depth network that effectively integrates information from feature maps at different resolutions by incorporating both channel and non-local attention mechanisms. Compared to methods with similar computational costs, Manydepth2 achieves a significant reduction of approximately five percent in root-mean-square error for self-supervised monocular depth estimation on the KITTI-2015 dataset. The code could be found at https://github.com/kaichen-z/Manydepth2.

Problem

Research questions and friction points this paper is trying to address.

Depth Estimation

Moving Objects

Changing Scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Manydepth2

motion-aware technique

self-learning capability

🔎 Similar Papers

No similar papers found.