Multithreaded Fine-Grained Asynchronous BSP for Integer Sorting with LCI and OpenMP

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of the traditional Bulk Synchronous Parallel (BSP) model in handling irregular workloads due to global synchronization and its poor adaptability to multicore architectures. The authors propose a fine-grained, multithreaded asynchronous BSP approach that integrates LCI with OpenMP, introducing multithreading and zero-copy active messaging into the FA-BSP model for the first time. This innovation overcomes the single-process, single-core limitation and enables efficient overlap of computation and communication. The proposed method significantly improves load balancing and scalability, demonstrating substantial performance gains over conventional MPI-based synchronous implementations on the NAS integer sort benchmark, thereby validating its effectiveness for high-performance irregular scientific computing.
📝 Abstract
The bulk synchronous parallel (BSP) model struggles with irregular workloads due to rigid global communication. While fine-grained asynchronous BSP (FA-BSP) improves overlap, existing implementations typically rely on a limiting one-process-per-core model. This paper proposes a multithreaded FA-BSP approach combining Lightweight Communication Interface (LCI) and OpenMP to fully exploit multicore architectures. We evaluate this design using the NAS Parallel Benchmark Integer Sort (IS), retaining the original irregular Gaussian distribution to rigorously test load balancing. By replacing synchronous MPI collectives with OpenMP multithreading and LCI's fine-grained, zero-copy active messages, we enable efficient computation-communication overlap. Our evaluation demonstrates that multithreaded FA-BSP significantly outperforms traditional bulk-synchronous MPI implementations, offering a scalable solution for irregular scientific applications.
Problem

Research questions and friction points this paper is trying to address.

Bulk Synchronous Parallel
Irregular Workloads
Fine-Grained Asynchronous BSP
Multicore Architectures
Load Balancing
Innovation

Methods, ideas, or system contributions that make the work stand out.

multithreaded FA-BSP
LCI
OpenMP
asynchronous BSP
irregular workloads
🔎 Similar Papers
No similar papers found.