Taking GPU Programming Models to Task for Performance Portability

📅 2024-02-14
🏛️ Proceedings of the 39th ACM International Conference on Supercomputing
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Assessing performance portability across GPU programming models on heterogeneous NVIDIA and AMD hardware remains challenging due to fragmented benchmarks and irreproducible evaluation methodologies. Method: We conduct a systematic, cross-platform evaluation of seven programming models—CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL—using five interdisciplinary proxy applications. Leveraging a Spack-based automation framework, we ensure fully reproducible build, deployment, and benchmarking across real multi-vendor GPU systems. Contribution/Results: Our empirical study quantifies both performance consistency and migration overhead for each model. HIP and SYCL achieve superior performance on AMD GPUs; CUDA remains dominant on NVIDIA hardware; Kokkos and RAJA deliver balanced portability with moderate performance; OpenMP and OpenACC exhibit significant cross-platform performance degradation. This work provides the first unified, vendor-agnostic assessment of GPU programming models’ performance portability and establishes a rigorous, reproducible methodology to guide architecture selection for high-performance scientific software.

Technology Category

Application Category

📝 Abstract
Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU platforms, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL, to study if the performance of these models is consistently good across NVIDIA and AMD GPUs. We use five proxy applications from different scientific domains, create implementations where missing, and use them to present a comprehensive comparative evaluation of the programming models. We provide a Spack scripting-based methodology to ensure reproducibility of experiments conducted in this work. Finally, we attempt to answer the question -- to what extent does each programming model provide performance portability for heterogeneous systems in real-world usage?
Problem

Research questions and friction points this paper is trying to address.

Evaluating performance portability across GPU programming models
Assessing consistency on NVIDIA and AMD GPU architectures
Analyzing underperformance causes and providing optimizations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates performance portability across GPU models
Uses proxy applications from diverse scientific domains
Provides Spack-based methodology for experiment reproducibility
🔎 Similar Papers
No similar papers found.