🤖 AI Summary
To address the performance bottleneck of 3D diffusion models caused by the scarcity of high-quality 3D training data, this work proposes a novel paradigm that leverages pretrained 2D diffusion models for 3D content generation. Our method introduces: (1) a differentiable Gaussian Atlas representation, which maps 3D Gaussians onto a 2D parameterized grid via manifold unfolding and reparameterization; (2) GaussianVerse—the first large-scale 3D Gaussian dataset, comprising 205K samples; and (3) an end-to-end transfer framework that jointly integrates 2D diffusion priors, Gaussian splatting representations, and geometric regularization. The approach enables text-to-3D Gaussian scene generation and achieves state-of-the-art results on multi-category 3D reconstruction and synthesis benchmarks, significantly outperforming existing 3D diffusion methods. It effectively bridges the performance gap between 2D and 3D generative modeling.
📝 Abstract
Recent advances in text-to-image diffusion models have been driven by the increasing availability of paired 2D data. However, the development of 3D diffusion models has been hindered by the scarcity of high-quality 3D data, resulting in less competitive performance compared to their 2D counterparts. To address this challenge, we propose repurposing pre-trained 2D diffusion models for 3D object generation. We introduce Gaussian Atlas, a novel representation that utilizes dense 2D grids, enabling the fine-tuning of 2D diffusion models to generate 3D Gaussians. Our approach demonstrates successful transfer learning from a pre-trained 2D diffusion model to a 2D manifold flattened from 3D structures. To support model training, we compile GaussianVerse, a large-scale dataset comprising 205K high-quality 3D Gaussian fittings of various 3D objects. Our experimental results show that text-to-image diffusion models can be effectively adapted for 3D content generation, bridging the gap between 2D and 3D modeling.