BokehDepth: Enhancing Monocular Depth Estimation through Bokeh Generation

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

Monocular depth estimation suffers from degraded accuracy in weak-texture, long-range, and geometrically ambiguous regions—precisely where defocus (bokeh) cues are most discriminative; moreover, noisy depth maps severely impair bokeh rendering quality. To address this, we propose an unsupervised two-stage framework that, for the first time, models defocus as a physics-driven geometric prior to decouple and enhance depth estimation. First, we design a physics-guided, controllable bokeh generator leveraging a pre-trained image-editing backbone for high-fidelity defocus synthesis. Second, we introduce a lightweight defocus-aware feature aggregation module enabling cross-modal fusion between defocus and depth features. Our method is plug-and-play compatible with mainstream monocular depth encoders. Extensive experiments demonstrate significant improvements in depth metric accuracy and robustness across multiple benchmarks, while simultaneously producing more natural and photorealistic bokeh effects.

Technology Category

Application Category

📝 Abstract

Bokeh and monocular depth estimation are tightly coupled through the same lens imaging geometry, yet current methods exploit this connection in incomplete ways. High-quality bokeh rendering pipelines typically depend on noisy depth maps, which amplify estimation errors into visible artifacts, while modern monocular metric depth models still struggle on weakly textured, distant and geometrically ambiguous regions where defocus cues are most informative. We introduce BokehDepth, a two-stage framework that decouples bokeh synthesis from depth prediction and treats defocus as an auxiliary supervision-free geometric cue. In Stage-1, a physically guided controllable bokeh generator, built on a powerful pretrained image editing backbone, produces depth-free bokeh stacks with calibrated bokeh strength from a single sharp input. In Stage-2, a lightweight defocus-aware aggregation module plugs into existing monocular depth encoders, fuses features along the defocus dimension, and exposes stable depth-sensitive variations while leaving downstream decoder unchanged. Across challenging benchmarks, BokehDepth improves visual fidelity over depth-map-based bokeh baselines and consistently boosts the metric accuracy and robustness of strong monocular depth foundation models.

Problem

Research questions and friction points this paper is trying to address.

Enhances monocular depth estimation using bokeh generation as supervision-free cue

Decouples bokeh synthesis from depth prediction to reduce visible artifacts

Improves depth accuracy in weakly textured and ambiguous image regions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework decouples bokeh synthesis from depth prediction

Controllable bokeh generator produces depth-free bokeh stacks from sharp input

Defocus-aware aggregation module fuses features to enhance depth estimation

🔎 Similar Papers

A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts