Automated MPI-X code generation for scalable finite-difference solvers

📅 2023-12-20

📈 Citations: 2

✨ Influential: 0

career value

231K/year

🤖 AI Summary

To address the demand for large-scale PDE simulations in seismic and medical imaging, this paper proposes a fully automated high-performance code generation method tailored to explicit finite-difference (FD) stencils. The approach deeply integrates MPI-X (including UCX and shared memory) distributed parallel code generation into the Devito domain-specific language (DSL) compilation pipeline—enabling end-to-end automation from symbolic modeling to HPC-ready code without source-code modifications, and supporting scalable CPU/GPU cross-platform execution. Key techniques include symbolic differentiation, loop optimization, communication–computation overlap, and GPU offloading. Experiments on multi-node CPU/GPU clusters demonstrate excellent strong and weak scaling, substantial reduction in execution time, and over 70% decrease in developer effort for coding and performance tuning. The framework has been successfully deployed in production-scale scientific computing tasks, including real-world seismic full-waveform inversion.

📝 Abstract

Partial differential equations (PDEs) are crucial in modeling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs at scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to execute explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modeling simulations for real-world applications at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive strong and weak scaling on CPU and GPU clusters, proving its effectiveness and capability to meet the demands of large-scale scientific simulations.

Problem

Research questions and friction points this paper is trying to address.

Finite Difference Solvers

Partial Differential Equations

Code Generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Code Generation

Parallel Computing

Devito Tool

🔎 Similar Papers

A shared compilation stack for distributed-memory parallelism in stencil DSLs