Linear Complexity $mathcal{H}^2$ Direct Solver for Fine-Grained Parallel Architectures

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses large-scale dense matrices amenable to hierarchical representations—specifically, strongly admissible H² matrices—and proposes a linear-complexity direct solver tailored for fine-grained parallel architectures. Methodologically, it employs a strongly recursive skeletonization factorization framework, integrating black-box matrix input, prefix-sum-based memory management, and multi-level matrix graph coloring for parallelism—requiring no geometric or analytic prior knowledge. Its key contribution is the first deep integration of recursive skeletonization with the H² format, achieving provably linear O(N) time and memory complexity for both factorization and solution phases. Experimental evaluation on million-scale matrices demonstrates near-perfect linear scalability with 16 threads, substantially reduced dynamic memory overhead, and backward error analysis confirms robust numerical stability.

Technology Category

Application Category

📝 Abstract
We present factorization and solution phases for a new linear complexity direct solver designed for concurrent batch operations on fine-grained parallel architectures, for matrices amenable to hierarchical representation. We focus on the strong-admissibility-based $mathcal{H}^2$ format, where strong recursive skeletonization factorization compresses remote interactions. We build upon previous implementations of $mathcal{H}^2$ matrix construction for efficient factorization and solution algorithm design, which are illustrated graphically in stepwise detail. The algorithms are ``blackbox'' in the sense that the only inputs are the matrix and right-hand side, without analytical or geometrical information about the origin of the system. We demonstrate linear complexity scaling in both time and memory on four representative families of dense matrices up to one million in size. Parallel scaling up to 16 threads is enabled by a multi-level matrix graph coloring and avoidance of dynamic memory allocations thanks to prefix-sum memory management. An experimental backward error analysis is included. We break down the timings of different phases, identify phases that are memory-bandwidth limited, and discuss alternatives for phases that may be sensitive to the trend to employ lower precisions for performance.
Problem

Research questions and friction points this paper is trying to address.

Develops linear complexity direct solver for hierarchical matrices
Enables fine-grained parallel processing on batch operations
Provides blackbox factorization without analytical system information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear complexity direct solver
Strong recursive skeletonization factorization
Multi-level matrix graph coloring
🔎 Similar Papers
2024-06-15IEEE Transactions on Very Large Scale Integration (VLSI) SystemsCitations: 0
W
Wajih Boukaram
Applied Mathematics and Computational Sciences, King Abdullah University of Science and Technology, 4700 KAUST, 23955-6900, Thuwal, KSA
D
David Keyes
Applied Mathematics and Computational Sciences, King Abdullah University of Science and Technology, 4700 KAUST, 23955-6900, Thuwal, KSA
S
Sherry Li
Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, MS 50A-3111, 94710, Berkeley, California, USA
Y
Yang Liu
Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, MS 50A-3111, 94710, Berkeley, California, USA
George Turkiyyah
George Turkiyyah
King Abdullah University of Science and Technology
computational sciencelarge scale problemsreal-time simulation