SPTCStencil: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swap

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the efficiency bottleneck in stencil computations on tensor cores (TCs)—caused by redundant zero-padding when converting stencil operators into dense matrix multiplication—this paper proposes SpTC, the first sparse tensor core acceleration paradigm tailored for scientific computing. Our method introduces a stride-swapping–driven sparse transformation that losslessly maps stencil operators onto native sparse GEMM formats supported by SpTC hardware. We further design high-performance GPU kernels and system-level optimizations co-designed with SpTC’s architectural features. This work pioneers the extension of sparse tensor cores beyond deep learning into stencil-based scientific simulations. Experimental evaluation demonstrates average speedups of 5.46× over conventional CPU/GPU implementations and 2.00× over dense TC-based approaches, significantly unlocking the performance potential of sparse hardware accelerators.

Technology Category

Application Category

📝 Abstract

Stencil computation, a pivotal numerical method in science and engineering, iteratively updates grid points using weighted neighbor contributions and exhibits strong parallelism for multi-core processors. Current optimization techniques targeting conducting stencil computation on tensor core accelerators incur substantial overheads due to redundant zero-padding during the transformation to matrix multiplication. To address this, we introduce a sparse computation paradigm that eliminates inefficiencies by exploiting specialized hardware units. This paper exploits the sparsity in these matrices as a feature and presents SPTCStencil, a high-performance stencil computation system accelerated by Sparse Tensor Core (SpTCs). SPTCStencil is the first to harness SpTCs for acceleration beyond deep learning domains. First, Our approach generalizes an efficient transformation of stencil computation into matrix multiplications and specializes this conversion for SpTC compatibility through a novel sparsification strategy. Furthermore, SPTCStencil incorporates a high-performance GPU kernel with systematic optimizations designed to maximize efficiency on SpTCs. Experimental evaluations demonstrate that SPTCStencil 5.46$ imes$ and Tensor Core-based approaches by 2.00$ imes$ on average.

Problem

Research questions and friction points this paper is trying to address.

Optimizing stencil computation for sparse tensor cores

Reducing overhead from zero-padding in matrix transformations

Enhancing performance beyond deep learning applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Tensor Core for stencil computation

Novel sparsification strategy for SpTC compatibility

High-performance GPU kernel for SpTC efficiency

🔎 Similar Papers

No similar papers found.