🤖 AI Summary
To address memory explosion and scale limitations in city-scale 3D scene generation, this paper proposes GaussianCity—a highly efficient single-pass feedforward framework. Methodologically, it builds upon 3D Gaussian Splatting and integrates bird’s-eye-view (BEV) representation, a Point Serializer architecture, and a lightweight decoding network. Its core contributions are: (1) the first BEV-Point compact intermediate representation, ensuring constant GPU memory consumption regardless of scene size; and (2) a spatially aware Gaussian attribute decoder that jointly models geometric structure and contextual semantics from BEV points. Evaluated under both UAV and street-level viewpoints, GaussianCity achieves state-of-the-art reconstruction quality, runs at 10.72 FPS—60× faster than CityDreamer—and drastically reduces GPU memory usage. Notably, it enables, for the first time, arbitrarily large-scale city modeling without memory bottlenecks.
📝 Abstract
3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage overhead (out-of-memory issues), arising from the need to expand points to billions, often demanding hundreds of Gigabytes of VRAM for a city scene spanning 10km^2. In this paper, we propose GaussianCity, a generative Gaussian Splatting framework dedicated to efficiently synthesizing unbounded 3D cities with a single feed-forward pass. Our key insights are two-fold: 1) Compact 3D Scene Representation: We introduce BEV-Point as a highly compact intermediate representation, ensuring that the growth in VRAM usage for unbounded scenes remains constant, thus enabling unbounded city generation. 2) Spatial-aware Gaussian Attribute Decoder: We present spatial-aware BEV-Point decoder to produce 3D Gaussian attributes, which leverages Point Serializer to integrate the structural and contextual characteristics of BEV points. Extensive experiments demonstrate that GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation. Notably, compared to CityDreamer, GaussianCity exhibits superior performance with a speedup of 60 times (10.72 FPS v.s. 0.18 FPS).