GPU-Accelerated Algorithms for Process Mapping

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

This work addresses the classical task graph mapping problem onto processing units in supercomputers, aiming to balance computational load and minimize inter-task communication overhead. For the first time, GPU acceleration is introduced into this domain, yielding two parallel algorithms: (1) a hierarchical multi-partitioning framework accelerated on GPUs, and (2) a GPU-accelerated multilevel graph partitioning implementation integrating optimized coarsening and refinement strategies. Experiments demonstrate speedups of up to 598× over state-of-the-art CPU-based solvers, with a geometric mean speedup of 77.6×; Algorithm (1) incurs only ~10% increase in communication cost while maintaining competitive solution quality. The core contribution is the establishment of a novel GPU-parallel paradigm for task mapping—breaking through long-standing performance bottlenecks inherent in traditional CPU-centric approaches.

Technology Category

Application Category

📝 Abstract

Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph partitioners, we propose two GPU-accelerated algorithms for this optimization problem. The first algorithm employs hierarchical multisection, which partitions the task graph alongside the hierarchy of the supercomputer. The method utilizes GPU-based graph partitioners to accelerate the mapping process. The second algorithm integrates process mapping directly into the modern multilevel graph partitioning pipeline. Vital phases like coarsening and refinement are accelerated by exploiting the parallelism of GPUs. In our experiments, both methods achieve speedups exceeding 300 when compared to state-of-the-art CPU-based algorithms. The first algorithm has, on average, about 10 percent greater communication costs and thus remains competitive to CPU algorithms. The second approach is much faster, with a geometric mean speedup of 77.6 and peak speedup of 598 at the cost of lower solution quality. To our knowledge, these are the first GPU-based algorithms for process mapping.

Problem

Research questions and friction points this paper is trying to address.

GPU-accelerated algorithms balance computational workload and minimize communication costs

Hierarchical multisection partitions task graphs using supercomputer hierarchy

Multilevel graph partitioning pipeline accelerates coarsening and refinement phases

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated hierarchical multisection for process mapping

GPU integration into multilevel graph partitioning pipeline

Parallel coarsening and refinement phases using GPU acceleration

🔎 Similar Papers

No similar papers found.