BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Monocular depth estimation (MDE) suffers from limited availability and poor quality of ground-truth depth annotations, severely constraining model robustness and cross-domain generalization. Method: We propose a reinforcement learning (RL)-driven depth-to-image (D2I) generation framework featuring a novel RL-optimized GAN generator that integrates autoregressive priors and geometric consistency constraints to synthesize over 20 million geometrically accurate, high-fidelity depth-image pairs. We further introduce a hybrid supervision paradigm combining teacher-generated pseudo-labels with sparse real depth annotations, enhanced by knowledge distillation and multi-stage loss optimization. Contribution/Results: Our approach achieves significant improvements over state-of-the-art methods across multiple benchmarks, particularly excelling in modeling complex structures and recovering fine-grained details. It substantially enhances cross-domain generalization and robustness under domain shift and challenging scene geometries.

Technology Category

Application Category

📝 Abstract

Monocular Depth Estimation (MDE) is a foundational task for computer vision. Traditional methods are limited by data scarcity and quality, hindering their robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image (D2I) generation framework that synthesizes over 20M realistic and geometrically accurate RGB images, each intrinsically paired with its ground truth depth, from diverse source depth maps. Then we train our depth estimation model on this dataset, employing a hybrid supervision strategy that integrates teacher pseudo-labels with ground truth depth for comprehensive and robust training. This innovative data generation and training paradigm enables BRIDGE to achieve breakthroughs in scale and domain diversity, consistently outperforming existing state-of-the-art approaches quantitatively and in complex scene detail capture, thereby fostering general and robust depth features. Code and models are available at https://dingning-liu.github.io/bridge.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Overcoming data scarcity in monocular depth estimation tasks

Generating realistic RGB-depth pairs with geometric accuracy

Improving depth feature robustness across diverse domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-optimized depth-to-image generation framework

Hybrid supervision with teacher pseudo-labels

Synthesizes 20M realistic RGB-depth image pairs

🔎 Similar Papers

Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes