Taking GPU Programming Models to Task for Performance Portability

📅 2024-02-14

🏛️ Proceedings of the 39th ACM International Conference on Supercomputing

📈 Citations: 3

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Assessing performance portability across GPU programming models on heterogeneous NVIDIA and AMD hardware remains challenging due to fragmented benchmarks and irreproducible evaluation methodologies. Method: We conduct a systematic, cross-platform evaluation of seven programming models—CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL—using five interdisciplinary proxy applications. Leveraging a Spack-based automation framework, we ensure fully reproducible build, deployment, and benchmarking across real multi-vendor GPU systems. Contribution/Results: Our empirical study quantifies both performance consistency and migration overhead for each model. HIP and SYCL achieve superior performance on AMD GPUs; CUDA remains dominant on NVIDIA hardware; Kokkos and RAJA deliver balanced portability with moderate performance; OpenMP and OpenACC exhibit significant cross-platform performance degradation. This work provides the first unified, vendor-agnostic assessment of GPU programming models’ performance portability and establishes a rigorous, reproducible methodology to guide architecture selection for high-performance scientific software.

Technology Category

Application Category

📝 Abstract

Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU platforms, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL, to study if the performance of these models is consistently good across NVIDIA and AMD GPUs. We use five proxy applications from different scientific domains, create implementations where missing, and use them to present a comprehensive comparative evaluation of the programming models. We provide a Spack scripting-based methodology to ensure reproducibility of experiments conducted in this work. Finally, we attempt to answer the question -- to what extent does each programming model provide performance portability for heterogeneous systems in real-world usage?

Problem

Research questions and friction points this paper is trying to address.

Evaluating performance portability across GPU programming models

Assessing consistency on NVIDIA and AMD GPU architectures

Analyzing underperformance causes and providing optimizations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates performance portability across GPU models

Uses proxy applications from diverse scientific domains

Provides Spack-based methodology for experiment reproducibility

🔎 Similar Papers

No similar papers found.