BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?

๐Ÿ“… 2025-07-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current deep foundation models (DFMs) for depth estimation are predominantly evaluated using geometric alignment metrics, which inherently introduce representation bias and impede fair, cross-architecture comparison. To address this, we propose BenchDepthโ€”the first application-oriented DFM benchmark that abandons alignment-based evaluation in favor of five downstream proxy tasks: depth completion, stereo matching, monocular 3D reconstruction, visual-inertial SLAM, and vision-language spatial understanding. This design enables representation-agnostic, task-driven performance assessment. We conduct a systematic evaluation across eight state-of-the-art DFMs, revealing substantial inter-model variability in task-specific performance and exposing critical limitations in generalization and functional scope. BenchDepth establishes a more practical, reproducible, and extensible evaluation paradigm for DFM development and validation.

Technology Category

Application Category

๐Ÿ“ Abstract
Depth estimation is a fundamental task in computer vision with diverse applications. Recent advancements in deep learning have led to powerful depth foundation models (DFMs), yet their evaluation remains challenging due to inconsistencies in existing protocols. Traditional benchmarks rely on alignment-based metrics that introduce biases, favor certain depth representations, and complicate fair comparisons. In this work, we propose BenchDepth, a new benchmark that evaluates DFMs through five carefully selected downstream proxy tasks: depth completion, stereo matching, monocular feed-forward 3D scene reconstruction, SLAM, and vision-language spatial understanding. Unlike conventional evaluation protocols, our approach assesses DFMs based on their practical utility in real-world applications, bypassing problematic alignment procedures. We benchmark eight state-of-the-art DFMs and provide an in-depth analysis of key findings and observations. We hope our work sparks further discussion in the community on best practices for depth model evaluation and paves the way for future research and advancements in depth estimation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating depth foundation models with inconsistent existing protocols
Addressing biases in traditional alignment-based depth evaluation metrics
Proposing BenchDepth for practical utility assessment via proxy tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates DFMs via five downstream proxy tasks
Assesses practical utility in real-world applications
Bypasses problematic alignment-based metrics
๐Ÿ”Ž Similar Papers
No similar papers found.