VDPP: Video Depth Post-Processing for Speed and Scalability

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing video depth estimation methods struggle to efficiently adapt to new single-frame models, and their post-processing strategies are often constrained in terms of speed, accuracy, and practicality. This work proposes the VDPP framework, which enables efficient video depth post-processing through low-resolution geometric residual learning. By discarding reliance on RGB inputs and leveraging only geometric information, VDPP performs temporally consistent residual refinement. Its modular design facilitates flexible deployment, achieving 43.5 FPS on an NVIDIA Jetson Orin Nano while maintaining high accuracy, low memory consumption, and strong scalability—performance that rivals end-to-end systems.

Technology Category

Application Category

📝 Abstract

Video depth estimation is essential for providing 3D scene structure in applications ranging from autonomous driving to mixed reality. Current end-to-end video depth models have established state-of-the-art performance. Although current end-to-end (E2E) models have achieved state-of-the-art performance, they function as tightly coupled systems that suffer from a significant adaptation lag whenever superior single-image depth estimators are released. To mitigate this issue, post-processing methods such as NVDS offer a modular plug-and-play alternative to incorporate any evolving image depth model without retraining. However, existing post-processing methods still struggle to match the efficiency and practicality of E2E systems due to limited speed, accuracy, and RGB reliance. In this work, we revitalize the role of post-processing by proposing VDPP (Video Depth Post-Processing), a framework that improves the speed and accuracy of post-processing methods for video depth estimation. By shifting the paradigm from computationally expensive scene reconstruction to targeted geometric refinement, VDPP operates purely on geometric refinements in low-resolution space. This design achieves exceptional speed (>43.5 FPS on NVIDIA Jetson Orin Nano) while matching the temporal coherence of E2E systems, with dense residual learning driving geometric representations rather than full reconstructions. Furthermore, our VDPP's RGB-free architecture ensures true scalability, enabling immediate integration with any evolving image depth model. Our results demonstrate that VDPP provides a superior balance of speed, accuracy, and memory efficiency, making it the most practical solution for real-time edge deployment. Our project page is at https://github.com/injun-baek/VDPP

Problem

Research questions and friction points this paper is trying to address.

video depth estimation

post-processing

adaptation lag

scalability

real-time deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

video depth estimation

post-processing

geometric refinement