Boosting Multi-View Stereo with Depth Foundation Model in the Absence of Real-World Labels

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This paper addresses the challenging problem of multi-view stereo (MVS) reconstruction in the absence of ground-truth depth labels. To this end, we propose DFM-MVS—a novel framework that pioneers the integration of depth foundation models (DFMs) to generate high-confidence depth priors. Leveraging these priors, we establish a prior-driven pseudo-supervision training paradigm and design a prior-guided error correction module, enabling coarse-to-fine stereo matching optimization and explicit geometric consistency modeling. Crucially, DFM-MVS operates without any real depth supervision, effectively mitigating key bottlenecks in unsupervised MVS—namely, severe noise in pseudo-labels and weak geometric constraints. Extensive experiments on DTU and Tanks & Temples benchmarks demonstrate that DFM-MVS consistently outperforms existing unsupervised and self-supervised methods, achieving reconstruction accuracy close to state-of-the-art supervised approaches. These results underscore the pivotal role and strong generalizability of depth priors in weakly supervised MVS.

Technology Category

Application Category

📝 Abstract

Learning-based Multi-View Stereo (MVS) methods have made remarkable progress in recent years. However, how to effectively train the network without using real-world labels remains a challenging problem. In this paper, driven by the recent advancements of vision foundation models, a novel method termed DFM-MVS, is proposed to leverage the depth foundation model to generate the effective depth prior, so as to boost MVS in the absence of real-world labels. Specifically, a depth prior-based pseudo-supervised training mechanism is developed to simulate realistic stereo correspondences using the generated depth prior, thereby constructing effective supervision for the MVS network. Besides, a depth prior-guided error correction strategy is presented to leverage the depth prior as guidance to mitigate the error propagation problem inherent in the widely-used coarse-to-fine network structure. Experimental results on DTU and Tanks&Temples datasets demonstrate that the proposed DFM-MVS significantly outperforms existing MVS methods without using real-world labels.

Problem

Research questions and friction points this paper is trying to address.

Train MVS networks without real-world depth labels

Generate depth priors using foundation models

Mitigate error propagation in coarse-to-fine MVS

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages depth foundation model for depth prior

Uses pseudo-supervised training with depth prior

Implements depth-guided error correction strategy

🔎 Similar Papers

MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo