SUperman: Efficient Permanent Computation on GPUs

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the #P-complete problem of efficiently computing matrix permanents. We propose the first parallel software framework designed for multi-GPU clusters, featuring customized computational strategies tailored to heterogeneous matrix types—including real, complex, binary, sparse, and dense matrices. Our approach integrates divide-and-conquer with dynamic programming, and leverages CUDA, MPI, mixed-precision arithmetic, and sparsity-aware scheduling to jointly optimize intra-node (multi-GPU) and inter-node execution. To our knowledge, this is the first scalable distributed permanent computation on multi-GPU systems, overcoming both single-node performance bottlenecks and cluster-wide communication overhead. On a single NVIDIA A100 GPU, our implementation achieves an 86× speedup over a 44-core CPU. Moreover, we successfully compute the permanent of a 56×56 matrix—the largest instance reported in the literature to date.

Technology Category

Application Category

📝 Abstract
The {em permanent} is a function, defined for a square matrix, with applications in various domains including quantum computing, statistical physics, complexity theory, combinatorics, and graph theory. Its formula is similar to that of the determinant, however unlike the determinant, its exact computation is #P-complete, i.e., there is no algorithm to compute the permanent in polynomial time unless P=NP. For an $n imes n$ matrix, the fastest algorithm has a time complexity of $O(2^{n-1}n)$. Although supercomputers have been employed for permanent computation before, there is no work and more importantly, no publicly available software that leverages cutting-edge, yet widely accessible, High-Performance Computing accelerators such as GPUs. In this work, we designed, developed, and investigated the performance of acro, a complete software suite that can compute matrix permanents on multiple nodes/GPUs on a cluster while handling various matrix types, e.g., real/complex/binary and sparse/dense etc., with a unique treatment for each type. Compared to a state-of-the-art parallel algorithm on 44 cores, acro can be $86 imes$ faster on a single Nvidia A100 GPU. Combining multiple GPUs, we also showed that acro can compute the permanent of a $56 imes 56$ matrix which is the largest reported in the literature.
Problem

Research questions and friction points this paper is trying to address.

Efficient permanent computation on GPUs
Handling various matrix types effectively
Largest matrix permanent computation reported
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated permanent computation
Handles multiple matrix types
Scalable across multiple GPUs
🔎 Similar Papers
No similar papers found.
D
Deniz Elbek
Sabanci University, Faculty of Engineering and Natural Sciences, Istanbul, Türkiye
F
Fatih Taşyarana
Sabanci University, Faculty of Engineering and Natural Sciences, Istanbul, Türkiye
Bora Uçar
Bora Uçar
CNRS and ENS Lyon
Combinatorial scientific computingparallel computingsparse matrix computationsgraph algorithms
Kamer Kaya
Kamer Kaya
Assoc. Prof., Sabancı University
High Performance ComputingParallel AlgorithmsGraph AlgorithmsCryptography