🤖 AI Summary
Existing monocular depth estimation benchmarks focus exclusively on accuracy, lacking systematic robustness evaluation. To address this gap, we propose PDE (Programmable Depth Evaluation), the first benchmark introducing a programmable scene perturbation framework. PDE generates diverse, photo-realistic 3D synthetic scenes by controllably and reproducibly varying four key dimensions—object layout, camera pose, material properties, and illumination conditions—enabling fine-grained robustness assessment. This approach fills a critical void in standardized robustness evaluation while supporting large-scale, high-precision control over input variables. Extensive experiments reveal consistent and significant performance degradation across state-of-the-art models under various perturbations. PDE provides the first reproducible, robustness-oriented evaluation platform and an open-source dataset, establishing a foundational resource for advancing reliability-aware monocular depth estimation research.
📝 Abstract
Recent years have witnessed substantial progress on monocular depth estimation, particularly as measured by the success of large models on standard benchmarks. However, performance on standard benchmarks does not offer a complete assessment, because most evaluate accuracy but not robustness. In this work, we introduce PDE (Procedural Depth Evaluation), a new benchmark which enables systematic robustness evaluation. PDE uses procedural generation to create 3D scenes that test robustness to various controlled perturbations, including object, camera, material and lighting changes. Our analysis yields interesting findings on what perturbations are challenging for state-of-the-art depth models, which we hope will inform further research. Code and data are available at https://github.com/princeton-vl/proc-depth-eval.