CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting

πŸ“… 2025-11-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
3D Gaussian Splatting (3DGS) struggles with memory-bound deployment on large-scale scenes, especially on single-consumer GPUs. Method: This paper proposes a CPU-GPU collaborative rendering framework tailored for single-GPU systems. Its core innovation is a dynamic Gaussian offloading strategy guided by access-pattern prediction, which migrates inactive Gaussians to CPU memory. To minimize data migration overhead, it employs computation-communication pipelining and fine-grained scheduling. Crucially, the method preserves the original 3DGS representation and training pipelineβ€”no modifications are required. Results: Evaluated on an RTX 4090, the framework efficiently renders scenes containing up to 100 million Gaussians, achieving state-of-the-art reconstruction quality while reducing GPU memory consumption by 67%. It enables practical deployment of 3DGS on resource-constrained hardware without compromising fidelity or compatibility.

Technology Category

Application Category

πŸ“ Abstract
3D Gaussian Splatting (3DGS) is an increasingly popular novel view synthesis approach due to its fast rendering time, and high-quality output. However, scaling 3DGS to large (or intricate) scenes is challenging due to its large memory requirement, which exceed most GPU's memory capacity. In this paper, we describe CLM, a system that allows 3DGS to render large scenes using a single consumer-grade GPU, e.g., RTX4090. It does so by offloading Gaussians to CPU memory, and loading them into GPU memory only when necessary. To reduce performance and communication overheads, CLM uses a novel offloading strategy that exploits observations about 3DGS's memory access pattern for pipelining, and thus overlap GPU-to-CPU communication, GPU computation and CPU computation. Furthermore, we also exploit observation about the access pattern to reduce communication volume. Our evaluation shows that the resulting implementation can render a large scene that requires 100 million Gaussians on a single RTX4090 and achieve state-of-the-art reconstruction quality.
Problem

Research questions and friction points this paper is trying to address.

Reducing GPU memory requirements for 3D Gaussian Splatting
Enabling large-scale scene rendering on consumer GPUs
Optimizing memory access patterns to minimize communication overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offloading Gaussians to CPU memory
Pipelining GPU-CPU communication and computation
Reducing communication volume via access patterns
πŸ”Ž Similar Papers
No similar papers found.
Hexu Zhao
Hexu Zhao
New York University, New York, NY, USA
X
Xiwen Min
New York University, New York, NY, USA
X
Xiaoteng Liu
New York University, New York, NY, USA
Moonjun Gong
Moonjun Gong
New York University
Computer VisionAutonomous Vehicles
Y
Yiming Li
New York University, New York, NY, USA
A
Ang Li
Pacific Northwest National Laboratory & University of Washington, Seattle, WA, USA
Saining Xie
Saining Xie
Courant Institute, New York University
computer visionmachine learningrepresentation learningartificial intelligence
J
Jinyang Li
New York University, New York, NY, USA
Aurojit Panda
Aurojit Panda
NYU
Distributed SystemsNetworkingCluster Computing