Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low development efficiency and poor portability of high-performance computing (HPC) applications across heterogeneous GPU architectures (e.g., NVIDIA GH200/A100, AMD MI250X), this work proposes a unified tensor kernel representation based on the Stateful Dataflow Multigraph (SDFG) formalism, integrated within the DaCe dataflow framework. Our method enables automatic, architecture-agnostic code generation and optimization—from high-level semantic specifications to multiple GPU backends—by deeply incorporating spectral-element-specific characteristics of small-tensor computations. It supports seamless integration into Fortran-based scientific software, specifically the Neko computational fluid dynamics (CFD) solver. Experimental evaluation demonstrates that the automatically generated kernels achieve performance comparable to hand-optimized implementations across all target platforms, significantly improving cross-architecture portability while reducing long-term maintenance overhead for scientific software.

Technology Category

Application Category

📝 Abstract
With the emergence of new high-performance computing (HPC) accelerators, such as Nvidia and AMD GPUs, efficiently targeting diverse hardware architectures has become a major challenge for HPC application developers. The increasing hardware diversity in HPC systems often necessitates the development of architecture-specific code, hindering the sustainability of large-scale scientific applications. In this work, we leverage DaCe, a data-centric parallel programming framework, to automate the generation of high-performance kernels. DaCe enables automatic code generation for multicore processors and various accelerators, reducing the burden on developers who would otherwise need to rewrite code for each new architecture. Our study demonstrates DaCe's capabilities by applying its automatic code generation to a critical computational kernel used in Computational Fluid Dynamics (CFD). Specifically, we focus on Neko, a Fortran-based solver that employs the spectral-element method, which relies on small tensor operations. We detail the formulation of this computational kernel using DaCe's Stateful Dataflow Multigraph (SDFG) representation and discuss how this approach facilitates high-performance code generation. Additionally, we outline the workflow for seamlessly integrating DaCe's generated code into the Neko solver. Our results highlight the portability and performance of the generated code across multiple platforms, including Nvidia GH200, Nvidia A100, and AMD MI250X GPUs, with competitive performance results. By demonstrating the potential of automatic code generation, we emphasise the feasibility of using portable solutions to ensure the long-term sustainability of large-scale scientific applications.
Problem

Research questions and friction points this paper is trying to address.

Automate high-performance kernel generation for diverse HPC accelerators
Reduce architecture-specific code development for CFD applications
Ensure portable solutions for sustainable large-scale scientific applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages DaCe for automatic high-performance kernel generation
Uses SDFG representation for portable code across architectures
Integrates generated code into Fortran-based Neko solver
🔎 Similar Papers
No similar papers found.