🤖 AI Summary
This work addresses the longstanding challenge in scientific computing of balancing performance and developer productivity, compounded by poor cross-GPU platform portability. To this end, we propose a Mojo-language compilation framework built on MLIR. The framework unifies Python ecosystem interoperability with CUDA-like syntax and introduces, for the first time, a compile-time programming model tailored to scientific computing kernels—including 7-point stencil, BabelStream, miniBUDE, and Hartree–Fock. Leveraging integrated MLIR-based lowering to LLVM, CUDA, and HIP backends, it delivers native support for both NVIDIA (H100) and AMD (MI300A) GPUs. Experimental evaluation shows that memory-bound kernels achieve performance on par with hand-optimized CUDA/HIP implementations, significantly bridging the toolchain fragmentation gap between AI and traditional HPC. However, kernels dominated by atomic operations or fast-math intrinsics remain targets for future optimization.
📝 Abstract
We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLIR) compiler infrastructure, Mojo aims to close performance and productivity gaps by combining Python's interoperability and CUDA-like syntax for compile-time portable GPU programming. We target four scientific workloads: a seven-point stencil (memory-bound), BabelStream (memory-bound), miniBUDE (compute-bound), and Hartree-Fock (compute-bound with atomic operations); and compare their performance against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. We show that Mojo's performance is competitive with CUDA and HIP for memory-bound kernels, whereas gaps exist on AMD GPUs for atomic operations and for fast-math compute-bound kernels on both AMD and NVIDIA GPUs. Although the learning curve and programming requirements are still fairly low-level, Mojo can close significant gaps in the fragmented Python ecosystem in the convergence of scientific computing and AI.