Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion

📅 2024-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D image generation methods suffer from geometric inconsistency, blurry textures, and multi-view incoherence, limiting the quality of synthesized 3D assets. To address these issues, we propose a fine-grained generation framework based on multi-view conditional diffusion. First, we introduce a novel multi-view conditional diffusion model that explicitly encodes cross-view geometric and appearance constraints to extract robust 3D priors. Second, we design an iterative backward optimization strategy that efficiently aligns and enhances fine details of coarse outputs under sparse multi-view inputs. Our method achieves end-to-end optimization in approximately 15 minutes, significantly improving geometric accuracy and texture fidelity. Compared to prevailing two-stage paradigms, our approach delivers superior single-image-to-3D reconstruction quality while maintaining high computational efficiency.

Technology Category

Application Category

📝 Abstract
Benefiting from the rapid development of 2D diffusion models, 3D content generation has witnessed significant progress. One promising solution is to finetune the pre-trained 2D diffusion models to produce multi-view images and then reconstruct them into 3D assets via feed-forward sparse-view reconstruction models. However, limited by the 3D inconsistency in the generated multi-view images and the low reconstruction resolution of the feed-forward reconstruction models, the generated 3d assets are still limited to incorrect geometries and blurry textures. To address this problem, we present a multi-view based refine method, named Magic-Boost, to further refine the generation results. In detail, we first propose a novel multi-view conditioned diffusion model which extracts 3d prior from the synthesized multi-view images to synthesize high-fidelity novel view images and then introduce a novel iterative-update strategy to adopt it to provide precise guidance to refine the coarse generated results through a fast optimization process. Conditioned on the strong 3d priors extracted from the synthesized multi-view images, Magic-Boost is capable of providing precise optimization guidance that well aligns with the coarse generated 3D assets, enriching the local detail in both geometry and texture within a short time ($sim15$min). Extensive experiments show Magic-Boost greatly enhances the coarse generated inputs, generates high-quality 3D assets with rich geometric and textural details. (Project Page: https://magic-research.github.io/magic-boost/)
Problem

Research questions and friction points this paper is trying to address.

3D image generation
shape accuracy
image clarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Magic-Boost
3D model generation
multi-angle 2D images
🔎 Similar Papers
F
Fan Yang
College of Computing and Data Science, Nanyang Technological University (NTU), 639798 Singapore
J
Jianfeng Zhang
ByteDance inc, 078881 Singapore
Yichun Shi
Yichun Shi
ByteDance
Computer VisionMachine Learning
B
Bowen Chen
ByteDance inc, 100098 Beijing, China
Chenxu Zhang
Chenxu Zhang
ByteDance Inc.
Computer GraphicsComputer VisionAI
Huichao Zhang
Huichao Zhang
Shanghai Jiaotong University
3dcomputer visionVLM
X
Xiaofeng Yang
College of Computing and Data Science, Nanyang Technological University (NTU), 639798 Singapore
Xiu Li
Xiu Li
Bytedance Seed
Computer VisionComputer Graphics3D Vision
Jiashi Feng
Jiashi Feng
ByteDance Inc.
computer visionmachine learning
Guosheng Lin
Guosheng Lin
Nanyang Technological University
Computer VisionMachine Learning