Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding challenge in scientific computing of balancing performance and developer productivity, compounded by poor cross-GPU platform portability. To this end, we propose a Mojo-language compilation framework built on MLIR. The framework unifies Python ecosystem interoperability with CUDA-like syntax and introduces, for the first time, a compile-time programming model tailored to scientific computing kernels—including 7-point stencil, BabelStream, miniBUDE, and Hartree–Fock. Leveraging integrated MLIR-based lowering to LLVM, CUDA, and HIP backends, it delivers native support for both NVIDIA (H100) and AMD (MI300A) GPUs. Experimental evaluation shows that memory-bound kernels achieve performance on par with hand-optimized CUDA/HIP implementations, significantly bridging the toolchain fragmentation gap between AI and traditional HPC. However, kernels dominated by atomic operations or fast-math intrinsics remain targets for future optimization.

Technology Category

Application Category

📝 Abstract
We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLIR) compiler infrastructure, Mojo aims to close performance and productivity gaps by combining Python's interoperability and CUDA-like syntax for compile-time portable GPU programming. We target four scientific workloads: a seven-point stencil (memory-bound), BabelStream (memory-bound), miniBUDE (compute-bound), and Hartree-Fock (compute-bound with atomic operations); and compare their performance against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. We show that Mojo's performance is competitive with CUDA and HIP for memory-bound kernels, whereas gaps exist on AMD GPUs for atomic operations and for fast-math compute-bound kernels on both AMD and NVIDIA GPUs. Although the learning curve and programming requirements are still fairly low-level, Mojo can close significant gaps in the fragmented Python ecosystem in the convergence of scientific computing and AI.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Mojo's performance and portability for scientific computing on GPUs
Comparing Mojo against CUDA and HIP on NVIDIA and AMD GPU architectures
Addressing Python ecosystem gaps in scientific computing and AI convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLIR-based Mojo language for GPU programming
Combines Python interoperability with CUDA-like syntax
Targets performance-portable scientific kernels on GPUs
🔎 Similar Papers
No similar papers found.
W
William F. Godoy
Oak Ridge National Laboratory, Oak Ridge, TN, USA
T
Tatiana Melnichenko
Innovative Computing Laboratory, The University of Tennessee, Knoxville; Oak Ridge National Laboratory, Oak Ridge, TN, USA
P
Pedro Valero-Lara
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Wael Elwasif
Wael Elwasif
Oak Ridge National Laboratory
P
Philip Fackler
Oak Ridge National Laboratory, Oak Ridge, TN, USA
R
Rafael Ferreira Da Silva
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Keita Teranishi
Keita Teranishi
Oak Ridge National Laboratory
high performance computing
Jeffrey S. Vetter
Jeffrey S. Vetter
Oak Ridge National Laboratory
high performance computing