GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes GaussianGPT, the first autoregressive generative model for explicit 3D Gaussian representations based on a causal Transformer architecture. While existing 3D generation methods predominantly rely on diffusion or flow-matching frameworks, this study explores the underutilized potential of autoregressive modeling in explicit 3D scene synthesis. The approach employs a sparse 3D convolutional autoencoder combined with vector quantization to compress Gaussian primitives into a discrete latent grid, and introduces 3D rotational positional encoding to enable effective sequence modeling. The framework supports diverse generation tasks—including scene completion, outpainting, temperature-controlled sampling, and variable-length synthesis—while preserving compositional inductive biases. GaussianGPT achieves high-quality, context-aware 3D generation and significantly outperforms conventional holistic optimization methods, offering superior flexibility and controllability.
📝 Abstract
Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using a causal transformer with 3D rotary positional embedding, enabling sequential generation of spatial structure and appearance. Unlike diffusion-based methods that refine scenes holistically, our formulation constructs scenes step-by-step, naturally supporting completion, outpainting, controllable sampling via temperature, and flexible generation horizons. This formulation leverages the compositional inductive biases and scalability of autoregressive modeling while operating on explicit representations compatible with modern neural rendering pipelines, positioning autoregressive transformers as a complementary paradigm for controllable and context-aware 3D generation.
Problem

Research questions and friction points this paper is trying to address.

autoregressive
3D Gaussian
scene generation
transformer
generative modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive generation
3D Gaussian splatting
transformer
discrete latent representation
neural rendering
🔎 Similar Papers
No similar papers found.
N
Nicolas von Lützow
Technical University of Munich, Germany
B
Barbara Rössle
Technical University of Munich, Germany
K
Katharina Schmid
Technical University of Munich, Germany
Matthias Nießner
Matthias Nießner
Professor of Computer Science, Technical University of Munich
Computer GraphicsComputer VisionArtificial IntelligenceMachine Learning