🤖 AI Summary
3D Gaussian Splatting (3DGS) is inherently discrete, unordered, and permutation-invariant, making direct generative modeling challenging. Method: We propose UV-space Gaussian Splatting (UVGS), which maps unstructured 3D Gaussian point clouds onto a structured 2D multi-channel image via spherical UV parameterization—enabling seamless integration with standard 2D generative models without modifying pre-trained VAEs or latent diffusion models (LDMs), thus achieving zero-shot transfer. Contributions/Results: (1) First structured 3DGS representation based on spherical parameterization; (2) First resolution-scalable generation, where Gaussian count grows with output image resolution; (3) Unified support for unconditional, text-conditional, image-conditional, and scene-inpainting generation. Experiments demonstrate substantially lowered barriers to 3D content generation, with improved fidelity, controllability, and generalization across diverse scenes and conditions.
📝 Abstract
3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3D objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, and permutation-invariant nature. In this work, we present a simple yet effective method to overcome these challenges. We utilize spherical mapping to transform 3DGS into a structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions as a concatenation of Gaussian attributes such as position, scale, color, opacity, and rotation. We further find that these heterogeneous features can be compressed into a lower-dimensional (e.g., 3-channel) shared feature space using a carefully designed multi-branch network. The compressed UVGS can be treated as typical RGB images. Remarkably, we discover that typical VAEs trained with latent diffusion models can directly generalize to this new representation without additional training. Our novel representation makes it effortless to leverage foundational 2D models, such as diffusion models, to directly model 3DGS. Additionally, one can simply increase the 2D UV resolution to accommodate more Gaussians, making UVGS a scalable solution compared to typical 3D backbones. This approach immediately unlocks various novel generation applications of 3DGS by inherently utilizing the already developed superior 2D generation capabilities. In our experiments, we demonstrate various unconditional, conditional generation, and inpainting applications of 3DGS based on diffusion models, which were previously non-trivial.