🤖 AI Summary
Monocular depth estimation exhibits strong generalization but suffers from poor 3D consistency and lack of absolute scale, limiting its applicability to high-fidelity 3D reconstruction. To address this, we propose a two-stage differentiable rendering optimization framework. In the first stage, we jointly leverage Structure-from-Motion (SfM) calibration and triangle-mesh parameterization to achieve global scale recovery. In the second stage, we perform joint photometric and geometric consistency optimization over local mesh patches, augmented with non-rigid regularization to balance detail fidelity and structural correctness. Our approach is the first to unify SfM-based scaling with mesh-based depth refinement, significantly improving depth map density, accuracy, and cross-view consistency. Evaluated on challenging indoor scenes, it outperforms existing state-of-the-art methods, producing high-resolution, noise-robust, and geometrically–photometrically consistent absolute-depth maps—enabling robust novel-view synthesis and metric-accurate 3D reconstruction.
📝 Abstract
The accurate reconstruction of per-pixel depth for an image is vital for many tasks in computer graphics, computer vision, and robotics. In this paper, we present a novel approach to generate view consistent and detailed depth maps from a number of posed images. We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps and refine them in a two-stage optimization process based on a differentiable renderer. Taking the monocular depth map as input, we first scale this map to absolute distances based on structure-from-motion and transform the depths to a triangle surface mesh. We then refine this depth mesh in a local optimization, enforcing photometric and geometric consistency. Our evaluation shows that our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches. Overview and supplemental material of this project can be found at https://lorafib.github.io/ref_depth/.