🤖 AI Summary
This work addresses the challenge of scaling 3D Gaussian splatting to billion-scale primitives, which is severely constrained by single-GPU memory capacity. To overcome this limitation, the authors propose a synergistic framework combining chunked virtualized geometry, a hierarchical asynchronous I/O-compute overlapping pipeline, and trajectory-adaptive differential data streaming. This out-of-core training architecture treats GPU memory as a working-set cache and leverages a three-tier SSD–CPU–GPU storage hierarchy to efficiently manage massive parameter sets. The approach enables, for the first time, successful training of over one billion Gaussians on a single 24GB GPU, achieving superior reconstruction quality compared to existing single-GPU methods and surpassing prior scalability limits of approximately 100 million primitives with out-of-core techniques and 11 million with in-core approaches.
📝 Abstract
Training 3D Gaussian Splatting (3DGS) at billion-primitive scale is fundamentally memory-bound: each Gaussian primitive carries a large attribute vector, and the aggregate parameter table quickly exceeds GPU capacity, limiting prior systems to tens of millions of Gaussians on commodity single-GPU hardware. We observe that 3DGS training is inherently sparse and trajectory-conditioned: each iteration activates only the Gaussians visible from the current camera batch, so GPU memory can serve as a working-set cache rather than a persistent parameter store. Building on this insight, we introduce TideGS, an out-of-core training framework that manages parameters across an SSD-CPU-GPU hierarchy via three synergistic techniques: block-virtualized geometry for SSD-aligned spatial locality, a hierarchical asynchronous pipeline to overlap I/O with computation, and trajectory-adaptive differential streaming that transfers only incremental working-set deltas between iterations. Experiments show that TideGS enables training with over one billion Gaussians on a single 24 GB GPU while achieving the best reconstruction quality among evaluated single-GPU baselines on large-scale scenes, scaling beyond prior out-of-core baselines (e.g., approximately 100M Gaussians) and standard in-memory training (e.g., approximately 11M Gaussians).