Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms

๐Ÿ“… 2026-05-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

202K/year
๐Ÿค– AI Summary
This work addresses the high overhead and poor portability associated with overdecomposition on distributed-memory GPGPU platforms by introducing a novel runtime mechanism built on Charm++. Through optimized task scheduling, a unified device abstraction layer, and efficient communication management, the proposed approach delivers the first portable support for overdecomposition across GPU vendorsโ€”such as NVIDIA and AMDโ€”and diverse interconnect architectures. The method substantially reduces runtime overhead and demonstrates excellent scalability and performance in representative applications like adaptive mesh refinement and tree codes, thereby overcoming the performance limitations of conventional GPGPU programming in heterogeneous environments.
๐Ÿ“ Abstract
Overdecomposition has emerged as a powerful and sometimes essential technique in parallel programming. Many application domains or frameworks, including those based on adaptive mesh refinements, or tree codes use it. Charm++ is a parallel programming system which has demonstrated the utility of overdecomposition for many applications and in multiple contexts. However, the emergence of GPGPUs as a dominant compute component has created some real and perceived challenges for this paradigm, especially regarding the higher overhead brought about by overpartitioning -- having multiple objects assigned to the same GPGPU device. We address this issue as well as the issue of portability by developing techniques and software that demonstrate that overdecomposition can be efficiently and productively supported on combinations of GPU vendor types, and interconnection networks.
Problem

Research questions and friction points this paper is trying to address.

overdecomposition
GPGPU
distributed memory
portability
overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

overdecomposition
GPGPU
distributed memory
portability
parallel programming
๐Ÿ”Ž Similar Papers
No similar papers found.