CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement

📅 2024-04-02
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited accuracy and robustness of multi-view depth estimation under diverse camera configurations (e.g., varying relative poses and lens types), this paper proposes a plug-and-play iterative depth hypothesis pruning framework. Given an arbitrary initial depth map, the method adaptively resamples to generate a set of depth hypotheses and—novelly—introduces contrastive learning into the multi-view depth hypothesis space to learn scale- and configuration-invariant discriminative features. It further integrates multi-view geometric constraints with adaptive metric-space mapping to robustly select the optimal hypothesis. Evaluated on standard benchmarks, the approach significantly improves both depth and surface normal estimation accuracy, consistently outperforming state-of-the-art deep learning-based stereo matching methods.

Technology Category

Application Category

📝 Abstract
We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, and automatically adapts to different metric or intrinsic scales determined by the capture system. The key to our approach is the application of contrastive learning in an appropriate solution space and a carefully designed hypothesis feature, based on which positive and negative hypotheses can be effectively distinguished. Integrated in a simple baseline multi-view stereo pipeline, CHOSEN delivers impressive quality in terms of depth and normal accuracy compared to many current deep learning based multi-view stereo pipelines.
Problem

Research questions and friction points this paper is trying to address.

Refines multi-view depth estimation iteratively
Adapts to various capture systems and scales
Uses contrastive learning for hypothesis selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning for depth hypothesis selection
Adaptive resampling for multi-view capture systems
Feature design for effective hypothesis distinction
🔎 Similar Papers
No similar papers found.
D
Di Qiu
Google AR
Yinda Zhang
Yinda Zhang
Google Research
Computer VisionComputer GraphicsDeep LearningScene UnderstandingDigital Human
T
T. Beeler
Google AR
V
V. Tankovich
C
Christian Hane
Meta Reality Labs
S
S. Fanello
Google AR
Christoph Rhemann
Christoph Rhemann
Google AR
S
Sergio Orts Escolano
Google AR