Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy

📅 2024-12-09

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 1

career value

207K/year

🤖 AI Summary

This work addresses key challenges in high-fidelity 3D object and clothed human generation from a single RGB image—namely, weak 3D consistency and poor generalization. We propose the first collaborative optimization framework unifying 2D and 3D diffusion models. Our method couples a pre-trained 2D diffusion model (e.g., SDXL) with a differentiable 3D diffusion model instantiated over voxel, point cloud, or NeRF representations. Through cross-modal gradient propagation and multi-view consistency constraints, it enables bidirectional enhancement during both training and sampling: the 2D model injects strong geometric priors, while the 3D model enforces geometric and textural coherence across views. Experiments demonstrate significant improvements in geometric accuracy and texture fidelity, robust support for diverse garments and complex human-clothing configurations, and strong generalization to unseen poses and clothing styles. The code and models are publicly released.

Technology Category

Application Category

📝 Abstract

Creating realistic 3D objects and clothed avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot guarantee the generated multi-view images are 3D consistent. In this paper, we propose Gen-3Diffusion: Realistic Image-to-3D Generation via 2D&3D Diffusion Synergy. We leverage a pre-trained 2D diffusion model and a 3D diffusion model via our elegantly designed process that synchronizes two diffusion models at both training and sampling time. The synergy between the 2D and 3D diffusion models brings two major advantages: 1) 2D helps 3D in generalization: the pretrained 2D model has strong generalization ability to unseen images, providing strong shape priors for the 3D diffusion model; 2) 3D helps 2D in multi-view consistency: the 3D diffusion model enhances the 3D consistency of 2D multi-view sampling process, resulting in more accurate multi-view generation. We validate our idea through extensive experiments in image-based objects and clothed avatar generation tasks. Results show that our method generates realistic 3D objects and avatars with high-fidelity geometry and texture. Extensive ablations also validate our design choices and demonstrate the strong generalization ability to diverse clothing and compositional shapes. Our code and pretrained models will be publicly released on https://yuxuan-xue.com/gen-3diffusion.

Problem

Research questions and friction points this paper is trying to address.

Generating realistic 3D objects from single RGB images

Ensuring multi-view consistency in 3D generation process

Combining 2D generalization with 3D consistency for avatars

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synergizes 2D and 3D diffusion models

Uses 2D diffusion for generalization priors

Employs 3D diffusion for view consistency

🔎 Similar Papers

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation