GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Monocular metric depth estimation suffers from inherent scale ambiguity, particularly failing in scenes lacking geometric priors—e.g., transparent or specular surfaces. To address this, we propose a training-free inverse problem framework leveraging diffusion models: it employs a pre-trained latent diffusion model (LDM) as a generative prior, conditions on the input RGB image, and—crucially—introduces differentiable stereo-geometric constraints (e.g., epipolar consistency and disparity-depth mapping) as regularization terms to jointly optimize absolute depth maps. By eliminating reliance on supervised fine-tuning, our method is fully plug-and-play. It achieves state-of-the-art performance across diverse indoor and outdoor scenes, significantly improving depth accuracy and robustness on challenging surfaces. This work establishes a novel paradigm integrating generative modeling with geometric reasoning, enabling geometry-aware depth reconstruction without task-specific training.

Technology Category

Application Category

📝 Abstract

We introduce a novel framework for metric depth estimation that enhances pretrained diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance. While existing DB-MDE methods excel at predicting relative depth, estimating absolute metric depth remains challenging due to scale ambiguities in single-image scenarios. To address this, we reframe depth estimation as an inverse problem, leveraging pretrained latent diffusion models (LDMs) conditioned on RGB images, combined with stereo-based geometric constraints, to learn scale and shift for accurate depth recovery. Our training-free solution seamlessly integrates into existing DB-MDE frameworks and generalizes across indoor, outdoor, and complex environments. Extensive experiments demonstrate that our approach matches or surpasses state-of-the-art methods, particularly in challenging scenarios involving translucent and specular surfaces, all without requiring retraining.

Problem

Research questions and friction points this paper is trying to address.

Addresses metric depth estimation from single images

Resolves scale ambiguities using stereo geometric constraints

Enhances pretrained diffusion models without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided diffusion for metric depth estimation

Stereo vision constraints resolve scale ambiguities

Training-free integration into existing depth frameworks

🔎 Similar Papers

Self-supervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion