tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter Selection

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

GPU GEMM kernels traditionally rely on time-consuming runtime autotuning to determine optimal launch parameters. Method: This paper proposes an analytical modeling approach that explicitly and jointly models the GPU memory hierarchy, code generation logic, and data layout constraints—enabling prediction of near-optimal kernel configurations solely from matrix dimensions, hardware architectural features, and tiling strategies. Implemented as a lightweight Triton-based GEMM framework, it eliminates runtime tuning entirely. Results: Across multiple GPU generations and GEMM problem sizes, the method achieves ≥95% of the performance attained by state-of-the-art autotuners, while reducing tuning overhead to zero. Its core contribution is the first interpretable, tuning-free analytical model for GEMM parameter prediction—significantly improving deployment efficiency and cross-architecture generalizability.

Technology Category

Application Category

📝 Abstract

We present tritonBLAS, a fast and deterministic analytical model that uses architectural parameters like the cache hierarchy, and relative code and data placement to generate performant GPU GEMM kernels. tritonBLAS explicitly models the relationship between architectural topology, matrix shapes, and algorithmic blocking behavior to predict near-optimal configurations without runtime autotuning. Based on this model, we developed and implemented a lightweight GEMM framework entirely within Triton. We evaluate the performance of tritonBLAS across a diverse set of GEMM problem sizes on modern GPUs. tritonBLAS achieves over 95% of the performance of autotuning solutions, while reducing autotuning time to zero. This makes tritonBLAS a practical drop-in replacement for empirical tuning in production HPC and ML workloads.

Problem

Research questions and friction points this paper is trying to address.

Develops a fast analytical model for GPU GEMM kernel parameter selection

Predicts near-optimal configurations without runtime autotuning

Achieves high performance as a practical replacement for empirical tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analytical model predicts GPU GEMM kernels without autotuning

Lightweight framework implemented entirely within Triton

Achieves near-optimal performance with zero tuning time

🔎 Similar Papers

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads