3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

📅 2024-10-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing multi-view diffusion models rely on 2D network architectures lacking intrinsic 3D priors, resulting in geometrically inconsistent 3D outputs. To address this, we propose 3D-Adapter—a plug-and-play module that endows pre-trained 2D diffusion models with geometric awareness without architectural modification. Our key contributions are: (1) a novel 3D feedback enhancement mechanism that dynamically reconstructs and re-renders RGB-D views at each denoising step, enabling closed-loop geometric optimization; and (2) two complementary implementations—Gaussian Splatting-accelerated and training-free universal variants compatible with neural fields or mesh-based representations. The method integrates multi-view diffusion, differentiable rendering, feature-level cross-modal feedback, and an RGB-D encoding-decoding loop. Experiments demonstrate substantial geometric quality improvements over Instant3D and Zero123++, and—critically—the first high-fidelity text-to-3D generation directly from Stable Diffusion. Our approach achieves state-of-the-art performance in text/image-to-3D synthesis, texture generation, and virtual human modeling.

Technology Category

Application Category

📝 Abstract

Multi-view image diffusion models have significantly advanced open-domain 3D object generation. However, most existing models rely on 2D network architectures that lack inherent 3D biases, resulting in compromised geometric consistency. To address this challenge, we introduce 3D-Adapter, a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models. Central to our approach is the idea of 3D feedback augmentation: for each denoising step in the sampling loop, 3D-Adapter decodes intermediate multi-view features into a coherent 3D representation, then re-encodes the rendered RGBD views to augment the pretrained base model through feature addition. We study two variants of 3D-Adapter: a fast feed-forward version based on Gaussian splatting and a versatile training-free version utilizing neural fields and meshes. Our extensive experiments demonstrate that 3D-Adapter not only greatly enhances the geometry quality of text-to-multi-view models such as Instant3D and Zero123++, but also enables high-quality 3D generation using the plain text-to-image Stable Diffusion. Furthermore, we showcase the broad application potential of 3D-Adapter by presenting high quality results in text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhance geometric consistency in 3D generation

Integrate 3D geometry into diffusion models

Improve quality in multi-view 3D tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D feedback augmentation

Gaussian splatting

Neural fields and meshes

🔎 Similar Papers

No similar papers found.