GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Rendering large-scale 3D Gaussian Splatting (3DGS) models in real time on consumer-grade devices for immersive applications (e.g., VR) faces conflicting demands of high fidelity, ultra-low latency, and severe hardware resource constraints. To address this, we propose an end-to-end inference framework featuring a novel cache-centralized rendering pipeline and an efficiency-aware, elastic multi-GPU scheduler—enabling tight co-optimization across system and representation levels. Our approach integrates 3DGS representation compression, custom CUDA kernels, hierarchical GPU memory caching, and dynamic load-balancing algorithms. Experiments demonstrate that our method achieves a 5.35× speedup over baseline implementations, reduces end-to-end latency by 35%, and cuts GPU memory consumption by 42%. It robustly supports high-quality stereo 2K rendering at over 120 FPS on commodity hardware.

Technology Category

Application Category

📝 Abstract
Rendering large-scale 3D Gaussian Splatting (3DGS) model faces significant challenges in achieving real-time, high-fidelity performance on consumer-grade devices. Fully realizing the potential of 3DGS in applications such as virtual reality (VR) requires addressing critical system-level challenges to support real-time, immersive experiences. We propose GS-Cache, an end-to-end framework that seamlessly integrates 3DGS's advanced representation with a highly optimized rendering system. GS-Cache introduces a cache-centric pipeline to eliminate redundant computations, an efficiency-aware scheduler for elastic multi-GPU rendering, and optimized CUDA kernels to overcome computational bottlenecks. This synergy between 3DGS and system design enables GS-Cache to achieve up to 5.35x performance improvement, 35% latency reduction, and 42% lower GPU memory usage, supporting 2K binocular rendering at over 120 FPS with high visual quality. By bridging the gap between 3DGS's representation power and the demands of VR systems, GS-Cache establishes a scalable and efficient framework for real-time neural rendering in immersive environments.
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D Gaussian Splatting rendering
Optimizing multi-GPU performance
Reducing latency and GPU memory usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cache-centric pipeline design
Efficiency-aware GPU scheduler
Optimized CUDA kernels
🔎 Similar Papers
No similar papers found.
M
Miao Tao
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Y
Yuanzhen Zhou
Shanghai Artificial Intelligence Laboratory, Shanghai, China
H
Haoran Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Zeyu He
Zeyu He
Ph.D. Student, Penn State University
Natural Language ProcessingHCICrowdsourcing
Z
Zhenyu Yang
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Y
Yuchang Zhang
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Z
Zhongling Su
Shanghai Artificial Intelligence Laboratory, Shanghai, China
L
Linning Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Z
Zhenxiang Ma
Shanghai Artificial Intelligence Laboratory, Shanghai, China
R
Rong Fu
Shanghai Artificial Intelligence Laboratory, Shanghai, China
H
Hengjie Li
Shanghai Artificial Intelligence Laboratory, Shanghai, China
X
Xingcheng Zhang
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Jidong Zhai
Jidong Zhai
Tsinghua University
Parallel ComputingCompilerProgramming ModelGPU