Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sparse matrix permutation, extraction, and assignment in distributed-memory systems suffer from high communication overhead and poor scalability. To address this, we propose an efficient parallel algorithm based on the “Identify-Exchange-Build” (IEB) paradigm. Our approach explicitly decouples communication and computation phases and employs synchronization-free multithreading to accelerate local submatrix construction, ensuring load balance while supporting complex use cases such as graph reordering and streaming graph processing. Experiments on heterogeneous supercomputing platforms—including Perlmutter—demonstrate that our method significantly outperforms CombBLAS and PETSc across diverse sparse matrix operations: it reduces communication volume by up to 37% and achieves strong scaling to over 10,000 cores. This work provides a highly scalable, low-overhead foundation for large-scale graph analytics and sparse linear algebra computations.

Technology Category

Application Category

📝 Abstract
We present scalable distributed-memory algorithms for sparse matrix permutation, extraction, and assignment. Our methods follow an Identify-Exchange-Build (IEB) strategy where each process identifies the local nonzeros to be sent, exchanges the required data, and then builds its local submatrix from the received elements. This approach reduces communication compared to SpGEMM-based methods in distributed memory. By employing synchronization-free multithreaded algorithms, we further accelerate local computations, achieving substantially better performance than existing libraries such as CombBLAS and PETSc. We design efficient software for these operations and evaluate their performance on two university clusters and the Perlmutter supercomputer. Our experiments span a variety of application scenarios, including matrix permutation for load balancing, matrix reordering, subgraph extraction, and streaming graph applications. In all cases, we compare our algorithms against CombBLAS, the most comprehensive distributed library for these operations, and, in some scenarios, against PETSc. Overall, this work provides a comprehensive study of algorithms, software implementations, experimental evaluations, and applications for sparse matrix permutation, extraction, and assignment.
Problem

Research questions and friction points this paper is trying to address.

Developing scalable distributed algorithms for sparse matrix operations
Reducing communication overhead compared to SpGEMM-based methods
Accelerating local computations through synchronization-free multithreading
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify-Exchange-Build strategy for sparse matrix operations
Synchronization-free multithreaded algorithms for local computations
Communication reduction compared to SpGEMM-based methods