Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

📅 2024-06-05

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing single-image-to-3D methods predominantly adopt a two-stage decoupled paradigm—first generating multi-view images, then reconstructing 3D geometry—leading to training-inference inconsistency and geometric distortion. This work proposes an end-to-end recursive diffusion framework, introducing the first 3D-aware self-feedback mechanism to jointly optimize multi-view synthesis and 3D reconstruction in a dynamically co-adaptive manner. By integrating differentiable rendering, 3D-aware map-conditioned guidance, and self-conditioned recursive optimization, the model implicitly enforces cross-view geometric consistency during training. Evaluated on multiple benchmarks, our method significantly outperforms both staged pipelines and separately trained approaches: Chamfer Distance improves by 21.3%, F-Score increases by 18.7%, while visual realism and fine-grained geometric fidelity are simultaneously enhanced.

Technology Category

Application Category

📝 Abstract

Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency.Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase. Project page: https://costwen.github.io/Ouroboros3D/

Problem

Research questions and friction points this paper is trying to address.

Unifies multi-view image generation and 3D reconstruction into one framework

Reduces data bias and improves 3D reconstruction quality

Enhances geometric consistency via 3D-aware recursive diffusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified 3D generation via recursive diffusion

Joint training with self-conditioning mechanism

3D-aware feedback improves geometric consistency

🔎 Similar Papers

No similar papers found.