🤖 AI Summary
Assessing performance portability across GPU programming models on heterogeneous NVIDIA and AMD hardware remains challenging due to fragmented benchmarks and irreproducible evaluation methodologies.
Method: We conduct a systematic, cross-platform evaluation of seven programming models—CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL—using five interdisciplinary proxy applications. Leveraging a Spack-based automation framework, we ensure fully reproducible build, deployment, and benchmarking across real multi-vendor GPU systems.
Contribution/Results: Our empirical study quantifies both performance consistency and migration overhead for each model. HIP and SYCL achieve superior performance on AMD GPUs; CUDA remains dominant on NVIDIA hardware; Kokkos and RAJA deliver balanced portability with moderate performance; OpenMP and OpenACC exhibit significant cross-platform performance degradation. This work provides the first unified, vendor-agnostic assessment of GPU programming models’ performance portability and establishes a rigorous, reproducible methodology to guide architecture selection for high-performance scientific software.
📝 Abstract
Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU platforms, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL, to study if the performance of these models is consistently good across NVIDIA and AMD GPUs. We use five proxy applications from different scientific domains, create implementations where missing, and use them to present a comprehensive comparative evaluation of the programming models. We provide a Spack scripting-based methodology to ensure reproducibility of experiments conducted in this work. Finally, we attempt to answer the question -- to what extent does each programming model provide performance portability for heterogeneous systems in real-world usage?