DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP

๐Ÿ“… 2025-06-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

240K/year
๐Ÿค– AI Summary
To address portability bottlenecks in HPC arising from GPU heterogeneity and challenging distributed memory management, this paper introduces DiOMP, a distributed OpenMP framework. Methodologically, DiOMP features a novel unified runtime that integrates OpenMP target offloading with Partitioned Global Address Space (PGAS) semantics, enabling both symmetric and asymmetric GPU memory allocation. It further incorporates OMPCCLโ€”a lightweight, portable collective communication layerโ€”and is implemented via LLVM/OpenMP extensions, supporting GASNet-EX and GPI-2 communication backends across NVIDIA, AMD, and Grace Hopper platforms. Evaluation on A100, Grace Hopper, and MI250X systems demonstrates significant performance improvements for applications including matrix multiplication and MiniMod, while achieving strong scalability and programming simplicity.

Technology Category

Application Category

๐Ÿ“ Abstract
As core counts and heterogeneity rise in HPC, traditional hybrid programming models face challenges in managing distributed GPU memory and ensuring portability. This paper presents DiOMP, a distributed OpenMP framework that unifies OpenMP target offloading with the Partitioned Global Address Space (PGAS) model. Built atop LLVM/OpenMP and using GASNet-EX or GPI-2 for communication, DiOMP transparently handles global memory, supporting both symmetric and asymmetric GPU allocations. It leverages OMPCCL, a portable collective communication layer compatible with vendor libraries. DiOMP simplifies programming by abstracting device memory and communication, achieving superior scalability and programmability over traditional approaches. Evaluations on NVIDIA A100, Grace Hopper, and AMD MI250X show improved performance in micro-benchmarks and applications like matrix multiplication and Minimod, highlighting DiOMP's potential for scalable, portable, and efficient heterogeneous computing.
Problem

Research questions and friction points this paper is trying to address.

Manages distributed GPU memory in HPC systems
Ensures portability across heterogeneous architectures
Simplifies programming by abstracting device memory and communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies OpenMP offloading with PGAS model
Uses GASNet-EX or GPI-2 for communication
Leverages OMPCCL for portable collective communication
๐Ÿ”Ž Similar Papers
No similar papers found.