Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion

📅 2024-04-09

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing 3D image generation methods suffer from geometric inconsistency, blurry textures, and multi-view incoherence, limiting the quality of synthesized 3D assets. To address these issues, we propose a fine-grained generation framework based on multi-view conditional diffusion. First, we introduce a novel multi-view conditional diffusion model that explicitly encodes cross-view geometric and appearance constraints to extract robust 3D priors. Second, we design an iterative backward optimization strategy that efficiently aligns and enhances fine details of coarse outputs under sparse multi-view inputs. Our method achieves end-to-end optimization in approximately 15 minutes, significantly improving geometric accuracy and texture fidelity. Compared to prevailing two-stage paradigms, our approach delivers superior single-image-to-3D reconstruction quality while maintaining high computational efficiency.

Technology Category

Application Category

📝 Abstract

Benefiting from the rapid development of 2D diffusion models, 3D content generation has witnessed significant progress. One promising solution is to finetune the pre-trained 2D diffusion models to produce multi-view images and then reconstruct them into 3D assets via feed-forward sparse-view reconstruction models. However, limited by the 3D inconsistency in the generated multi-view images and the low reconstruction resolution of the feed-forward reconstruction models, the generated 3d assets are still limited to incorrect geometries and blurry textures. To address this problem, we present a multi-view based refine method, named Magic-Boost, to further refine the generation results. In detail, we first propose a novel multi-view conditioned diffusion model which extracts 3d prior from the synthesized multi-view images to synthesize high-fidelity novel view images and then introduce a novel iterative-update strategy to adopt it to provide precise guidance to refine the coarse generated results through a fast optimization process. Conditioned on the strong 3d priors extracted from the synthesized multi-view images, Magic-Boost is capable of providing precise optimization guidance that well aligns with the coarse generated 3D assets, enriching the local detail in both geometry and texture within a short time ($sim15$min). Extensive experiments show Magic-Boost greatly enhances the coarse generated inputs, generates high-quality 3D assets with rich geometric and textural details. (Project Page: https://magic-research.github.io/magic-boost/)

Problem

Research questions and friction points this paper is trying to address.

3D image generation

shape accuracy

image clarity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Magic-Boost

3D model generation

multi-angle 2D images

🔎 Similar Papers

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion