🤖 AI Summary
To address the high cost and intrusive nature of porting legacy scientific simulation codes—such as Fortran 2003 multigrid solvers—to heterogeneous architectures like GPUs, this paper proposes a zero-intrusion automatic porting method. Our approach leverages compile-time source-code parsing integrated with the Loopy framework to construct an end-to-end Fortran-to-GPU code generation pipeline, producing portable C++/CUDA kernels. We further introduce a lightweight translation layer and a custom MPI-GPU co-runtime to enable multi-node scheduling. Crucially, our method requires no modifications to the original Fortran source code, achieving seamless heterogeneity adaptation. Experimental evaluation demonstrates 2–3× speedup on a single GPU node and up to 6× strong scaling across multiple nodes. This work significantly lowers the barrier and engineering overhead for migrating scientific computing applications to heterogeneous systems.
📝 Abstract
Legacy codes are in ubiquitous use in scientific simulations; they are well-tested and there is significant time investment in their use. However, one challenge is the adoption of new, sometimes incompatible computing paradigms, such as GPU hardware. In this paper, we explore using automated code translation to enable execution of legacy multigrid solver code on GPUs without significant time investment and while avoiding intrusive changes to the codebase. We developed a thin, reusable translation layer that parses Fortran 2003 at compile time, interfacing with the existing library Loopy to transpile to C++/GPU code, which is then managed by a custom MPI runtime system that we created. With this low-effort approach, we are able to achieve a payoff of an approximately 2-3x speedup over a full CPU socket, and 6x in multi-node settings.