๐ค AI Summary
This work addresses the challenge that 3D Gaussian Splatting (3DGS) fails to leverage modern GPU Tensor Cores effectively, hindering its applicability in real-time rendering scenarios. To overcome this limitation, the authors present the first formulation that equivalently reformulates the rasterization blending operations of 3DGS into a general matrix multiplication (GEMM) structure, thereby enabling acceleration via Tensor Cores. They further design high-performance CUDA kernels integrated with a three-stage double-buffered pipeline to overlap computation and memory transfers. The proposed method achieves a 1.42ร speedup over the original 3DGS implementation and, when combined with existing optimization techniques, yields an additional average performance gain of 1.47ร, significantly enhancing the real-time rendering potential of 3DGS.
๐ Abstract
Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.