Video Depth Propagation

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video depth estimation methods face a fundamental trade-off: frame-wise monocular models lack temporal consistency, whereas explicit temporal modeling incurs high computational overhead and hinders real-time operation. To address this, we propose VeloDepth—a lightweight and robust online video depth estimation framework. Its core is a flow-guided depth feature propagation module, augmented with learnable residual correction and structured temporal consistency constraints, enabling precise inter-frame depth alignment and dynamic refinement. Crucially, VeloDepth unifies motion-guided propagation, online feature updating, and lightweight temporal regularization within a single-stage inference pipeline—marking the first such integration. In zero-shot evaluation, VeloDepth achieves state-of-the-art temporal consistency while matching the accuracy of leading methods, and attains significantly faster inference speed—fully satisfying real-time visual perception requirements.

Technology Category

Application Category

📝 Abstract
Depth estimation in videos is essential for visual perception in real-world applications. However, existing methods either rely on simple frame-by-frame monocular models, leading to temporal inconsistencies and inaccuracies, or use computationally demanding temporal modeling, unsuitable for real-time applications. These limitations significantly restrict general applicability and performance in practical settings. To address this, we propose VeloDepth, an efficient and robust online video depth estimation pipeline that effectively leverages spatiotemporal priors from previous depth predictions and performs deep feature propagation. Our method introduces a novel Propagation Module that refines and propagates depth features and predictions using flow-based warping coupled with learned residual corrections. In addition, our design structurally enforces temporal consistency, resulting in stable depth predictions across consecutive frames with improved efficiency. Comprehensive zero-shot evaluation on multiple benchmarks demonstrates the state-of-the-art temporal consistency and competitive accuracy of VeloDepth, alongside its significantly faster inference compared to existing video-based depth estimators. VeloDepth thus provides a practical, efficient, and accurate solution for real-time depth estimation suitable for diverse perception tasks. Code and models are available at https://github.com/lpiccinelli-eth/velodepth
Problem

Research questions and friction points this paper is trying to address.

Addresses temporal inconsistencies in video depth estimation
Reduces computational demands for real-time depth estimation
Improves general applicability of video depth estimation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online video depth estimation with spatiotemporal priors
Propagation Module using flow-based warping and learned corrections
Structurally enforced temporal consistency for stable predictions