Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of high-fidelity 3D volumetric reconstruction from a single image without ground-truth 3D annotations or multi-view supervision. We propose a “diffusion-depth distillation” framework that leverages a pre-trained 2D diffusion model and a monocular depth estimator to generate geometric priors; implicit geometric knowledge is then distilled into a lightweight, feed-forward reconstruction network. To our knowledge, this is the first approach to jointly exploit 2D diffusion models and depth priors for monocular 3D reconstruction—eliminating reliance on costly 3D ground truth or multi-view data. Evaluated on KITTI-360 and Waymo, our method achieves performance on par with or surpassing state-of-the-art multi-view supervised methods, while demonstrating superior robustness and generalization, particularly in dynamic scenes.

Technology Category

Application Category

📝 Abstract
Volumetric scene reconstruction from a single image is crucial for a broad range of applications like autonomous driving and robotics. Recent volumetric reconstruction methods achieve impressive results, but generally require expensive 3D ground truth or multi-view supervision. We propose to leverage pre-trained 2D diffusion models and depth prediction models to generate synthetic scene geometry from a single image. This can then be used to distill a feed-forward scene reconstruction model. Our experiments on the challenging KITTI-360 and Waymo datasets demonstrate that our method matches or outperforms state-of-the-art baselines that use multi-view supervision, and offers unique advantages, for example regarding dynamic scenes.
Problem

Research questions and friction points this paper is trying to address.

Monocular 3D reconstruction from single images
Reducing dependency on expensive 3D supervision
Improving dynamic scene reconstruction accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage pre-trained 2D diffusion models
Distill feed-forward scene reconstruction
Outperform multi-view supervision baselines
🔎 Similar Papers
No similar papers found.