GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending

๐Ÿ“… 2026-04-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge that 3D Gaussian Splatting (3DGS) fails to leverage modern GPU Tensor Cores effectively, hindering its applicability in real-time rendering scenarios. To overcome this limitation, the authors present the first formulation that equivalently reformulates the rasterization blending operations of 3DGS into a general matrix multiplication (GEMM) structure, thereby enabling acceleration via Tensor Cores. They further design high-performance CUDA kernels integrated with a three-stage double-buffered pipeline to overlap computation and memory transfers. The proposed method achieves a 1.42ร— speedup over the original 3DGS implementation and, when combined with existing optimization techniques, yields an additional average performance gain of 1.47ร—, significantly enhancing the real-time rendering potential of 3DGS.
๐Ÿ“ Abstract
Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.
Problem

Research questions and friction points this paper is trying to address.

3D Gaussian Splatting
Tensor Cores
GEMM
real-time rendering
GPU acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

GEMM
Tensor Cores
3D Gaussian Splatting
real-time rendering
CUDA optimization
๐Ÿ”Ž Similar Papers
No similar papers found.