π€ AI Summary
This work addresses two key challenges in feed-forward native 3D generation: (1) misalignment between latent spaces and 3D geometry, and (2) the trade-off between geometric detail fidelity and computational efficiency. We propose Atlas Gaussiansβa novel 3D representation that models shapes as a collection of locally UV-parameterized atlas patches, each decoded by a learnable network into a theoretically infinite 3D Gaussian point cloud. Our method introduces a patch-wise, UV-driven infinite point cloud generation paradigm, integrating local geometric-aware encoding, Transformer-based sequence modeling, and an efficient Gaussian decoding network, all trained end-to-end within a unified VAE-LDM framework. On native 3D generation tasks, our approach significantly outperforms state-of-the-art methods, producing outputs with rich geometric detail and high visual fidelity, while enabling real-time feed-forward inference.
π Abstract
Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables the generation of high-quality details. Moreover, due to local awareness of the representation, the transformer-based decoding procedure operates on a patch level, ensuring efficiency. We train a variational autoencoder to learn the Atlas Gaussians representation, and then apply a latent diffusion model on its latent space for learning 3D Generation. Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation. Project page: https://yanghtr.github.io/projects/atlas_gaussians.