Exploring Performance-Productivity Trade-offs in AMT Runtimes: A Task Bench Study of Itoyori, ItoyoriFBC, HPX, and MPI

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic comparison between asynchronous multitasking (AMT) runtime systems and MPI in terms of both performance and programming productivity. Leveraging the Task Bench framework, it presents the first unified benchmark incorporating Itoyori and ItoyoriFBC alongside HPX and MPI, evaluating them across diverse workloads using PGAS abstractions, RDMA-based work stealing, and future-based synchronization. Quantitative analysis via application efficiency, METG, lines of code, and library constructs reveals distinct trade-offs: Itoyori achieves the highest efficiency with the most concise code; MPI excels in regular, low-communication tasks yet requires verbose implementations; HPX demonstrates robust stability but lowest productivity; and ItoyoriFBC offers enhanced expressiveness at a modest performance cost.

Technology Category

Application Category

📝 Abstract
Asynchronous Many-Task (AMT) runtimes offer a productive alternative to the Message Passing Interface (MPI). However, the diverse AMT landscape makes fair comparisons challenging. Task Bench, proposed by Slaughter et al., addresses this challenge through a parameterized framework for evaluating parallel programming systems. This work integrates two recent cluster AMTs, Itoyori and ItoyoriFBC, into Task Bench for comprehensive evaluation against MPI and HPX. Itoyori employs a Partitioned Global Address Space (PGAS) model with RDMA-based work stealing, while ItoyoriFBC extends it with futurebased synchronization. We evaluate these systems in terms of both performance and programmer productivity. Performance is assessed across various configurations, including compute-bound kernels, weak scaling, and both imbalanced and communication-intensive patterns. Performance is quantified using application efficiency, i.e., the percentage of maximum performance achieved, and the Minimum Effective Task Granularity (METG), i.e., the smallest task duration before runtime overheads dominate. Programmer productivity is quantified using Lines of Code (LOC) and the Number of Library Constructs (NLC). Our results reveal distinct trade-offs. MPI achieves the highest efficiency for regular, communication-light workloads but requires verbose, lowlevel code. HPX maintains stable efficiency under load imbalance across varying node counts, yet ranks last in productivity metrics, demonstrating that AMTs do not inherently guarantee improved productivity over MPI. Itoyori achieves the highest efficiency in communication-intensive configurations while leading in programmer productivity. ItoyoriFBC exhibits slightly lower efficiency than Itoyori, though its future-based synchronization offers potential for expressing irregular workloads.
Problem

Research questions and friction points this paper is trying to address.

Asynchronous Many-Task
performance-productivity trade-offs
parallel programming systems
runtime comparison
programmer productivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Asynchronous Many-Task (AMT)
Task Bench
Partitioned Global Address Space (PGAS)
future-based synchronization
performance-productivity trade-off
🔎 Similar Papers
2024-02-14Proceedings of the 39th ACM International Conference on SupercomputingCitations: 3
T
Torben R. Lahnor
University of Kassel, Kassel, Germany
M
Mia Reitz
University of Kassel, Kassel, Germany
J
Jonas Posner
Fulda University of Applied Sciences, Fulda, Germany
Patrick Diehl
Patrick Diehl
Los Alamos National Laboratory
Crack and fracture mechanicsPeridynamicsHPCHPXAsynchronous Many-Tasking Runtimes