🤖 AI Summary
Gaussian splatting excels in novel-view synthesis for small-scale scenes but suffers from three key limitations: boundary artifacts induced by tiled processing, difficulty in multi-scale training, and GPU memory constraints that hinder city-scale scene rendering. To address these, we propose a hierarchical Gaussian representation coupled with an external-memory streaming rendering framework. Our method introduces a hybrid Gaussian point-tree data structure to store the entire scene in CPU memory, integrates viewpoint-dependent level-of-detail (LOD) selection with lightweight cache scheduling for dynamic on-demand loading of visible Gaussians, and leverages temporal coherence to enhance loading efficiency. This is the first approach enabling unified training and real-time rendering of ultra-large-scale scenes—from aerial to street-level—on a single consumer-grade GPU. It eliminates tiling artifacts and supports seamless, interactive multi-scale visualization.
📝 Abstract
Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks -- a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU -- without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes -- from broad aerial views to fine-grained ground-level details.