Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering

📅 2024-10-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Monocular depth estimation exhibits strong generalization but suffers from poor 3D consistency and lack of absolute scale, limiting its applicability to high-fidelity 3D reconstruction. To address this, we propose a two-stage differentiable rendering optimization framework. In the first stage, we jointly leverage Structure-from-Motion (SfM) calibration and triangle-mesh parameterization to achieve global scale recovery. In the second stage, we perform joint photometric and geometric consistency optimization over local mesh patches, augmented with non-rigid regularization to balance detail fidelity and structural correctness. Our approach is the first to unify SfM-based scaling with mesh-based depth refinement, significantly improving depth map density, accuracy, and cross-view consistency. Evaluated on challenging indoor scenes, it outperforms existing state-of-the-art methods, producing high-resolution, noise-robust, and geometrically–photometrically consistent absolute-depth maps—enabling robust novel-view synthesis and metric-accurate 3D reconstruction.

Technology Category

Application Category

📝 Abstract

The accurate reconstruction of per-pixel depth for an image is vital for many tasks in computer graphics, computer vision, and robotics. In this paper, we present a novel approach to generate view consistent and detailed depth maps from a number of posed images. We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps and refine them in a two-stage optimization process based on a differentiable renderer. Taking the monocular depth map as input, we first scale this map to absolute distances based on structure-from-motion and transform the depths to a triangle surface mesh. We then refine this depth mesh in a local optimization, enforcing photometric and geometric consistency. Our evaluation shows that our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches. Overview and supplemental material of this project can be found at https://lorafib.github.io/ref_depth/.

Problem

Research questions and friction points this paper is trying to address.

Refining monocular depth maps for 3D consistency

Lifting relative depth maps to error-free versions

Correcting artifacts via multi-view photometric supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view differentiable rendering refinement

Two-stage optimization with photometric supervision

Global scale estimation via structure-from-motion

🔎 Similar Papers

Self-supervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion