Atlas Gaussians Diffusion for 3D Generation with Infinite Number of Points

📅 2024-08-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses two key challenges in feed-forward native 3D generation: (1) misalignment between latent spaces and 3D geometry, and (2) the trade-off between geometric detail fidelity and computational efficiency. We propose Atlas Gaussians—a novel 3D representation that models shapes as a collection of locally UV-parameterized atlas patches, each decoded by a learnable network into a theoretically infinite 3D Gaussian point cloud. Our method introduces a patch-wise, UV-driven infinite point cloud generation paradigm, integrating local geometric-aware encoding, Transformer-based sequence modeling, and an efficient Gaussian decoding network, all trained end-to-end within a unified VAE-LDM framework. On native 3D generation tasks, our approach significantly outperforms state-of-the-art methods, producing outputs with rich geometric detail and high visual fidelity, while enabling real-time feed-forward inference.

Technology Category

Application Category

📝 Abstract

Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables the generation of high-quality details. Moreover, due to local awareness of the representation, the transformer-based decoding procedure operates on a patch level, ensuring efficiency. We train a variational autoencoder to learn the Atlas Gaussians representation, and then apply a latent diffusion model on its latent space for learning 3D Generation. Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation. Project page: https://yanghtr.github.io/projects/atlas_gaussians.

Problem

Research questions and friction points this paper is trying to address.

Designing high-fidelity 3D representation for latent diffusion

Generating detailed 3D shapes using local patch decoding

Enhancing efficiency in transformer-based 3D generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Atlas Gaussians representation for 3D generation

UV-based sampling for infinite 3D Gaussians

Transformer-based patch-level decoding for efficiency

🔎 Similar Papers

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation