FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Addressing the challenge of jointly achieving high accuracy, inter-frame consistency, and low latency in real-time depth estimation for high-resolution video, this paper proposes a lightweight temporal enhancement framework. Built upon a pre-trained single-image depth model, it incorporates explicit temporal consistency constraints and a resolution-adaptive decoding architecture, requiring only minimal video data for fine-tuning. To our knowledge, this is the first approach to enable real-time streaming inference at 2K resolution (2044×1148) and 24 FPS under lightweight fine-tuning, while maintaining both high depth accuracy and strong inter-frame consistency. Evaluated on multiple unseen datasets, the method improves boundary sharpness by 32%, achieves 2.1× faster inference speed than the state-of-the-art, and retains top-tier accuracy.

Technology Category

Application Category

📝 Abstract

A versatile video depth estimation model should (1) be accurate and consistent across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these capabilities are enabled with relatively little data and training. We evaluate our approach across multiple unseen datasets against state-of-the-art depth models, and find that ours outperforms them in terms of boundary sharpness and speed by a significant margin, while maintaining competitive accuracy. We hope our model will enable various applications that require high-resolution depth, such as video editing, and online decision-making, such as robotics.

Problem

Research questions and friction points this paper is trying to address.

Real-time high-resolution video depth estimation

Accurate and consistent frame-to-frame depth prediction

Efficient streaming at 2K resolution and 24 FPS

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time 2K video depth estimation

Modified pretrained single-image models

High-speed, sharp boundary depth maps

🔎 Similar Papers

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation