Parallel GPU-Accelerated Randomized Construction of Approximate Cholesky Preconditioners

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of parallel preconditioning for large-scale sparse graph Laplacian linear systems. We propose a GPU-accelerated randomized approximate Cholesky preconditioner construction method. Unlike conventional incomplete factorizations that rely on static sparsity patterns, our approach employs a randomized strategy to dynamically determine fill-in retention, enabling purely algebraic, low-overhead, fine-grained parallel factorization. Innovatively, we introduce a sparse dependency graph–driven dynamic task scheduler and CPU/GPU co-optimization, thereby overcoming limitations imposed by static structural assumptions. Experimental results on graph Laplacian systems demonstrate that, compared to state-of-the-art preconditioners—including algebraic multigrid (AMG) and incomplete Cholesky (IC)—our method achieves significantly faster convergence, improved end-to-end solution efficiency, and reduces preprocessing time by an order of magnitude. Moreover, it attains a GPU speedup exceeding 12×.

Technology Category

Application Category

📝 Abstract
We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as discretization of a partial differential equation, spectral graph partitioning, and learning problems on graphs. The preconditioner belongs to the family of incomplete factorizations and is purely algebraic. Unlike traditional incomplete factorizations, the new method employs randomization to determine whether or not to keep fill-ins, i.e., newly generated nonzero elements during Gaussian elimination. Since the sparsity pattern of the randomized factorization is unknown, computing such a factorization in parallel is extremely challenging, especially on many-core architectures such as GPUs. Our parallel algorithm dynamically computes the dependency among row/column indices of the Laplacian matrix to be factorized and processes the independent indices in parallel. Furthermore, unlike previous approaches, our method requires little pre-processing time. We implemented the parallel algorithm for multi-core CPUs and GPUs, and we compare their performance to other state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Constructs parallel GPU-accelerated approximate Cholesky preconditioners
Solves large sparse linear systems from graph Laplacians
Uses randomization for fill-in decisions in factorization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel GPU-accelerated randomized Cholesky preconditioner
Dynamic dependency computation for parallel processing
Minimal pre-processing time for efficient factorization
🔎 Similar Papers
No similar papers found.
T
Tianyu Liang
University of California, Berkeley
C
Chao Chen
North Carolina State University
Yotam Yaniv
Yotam Yaniv
Lawrence Berkeley National Lab
Hengrui Luo
Hengrui Luo
Unknown affiliation
D
David Tench
Lawrence Berkeley National Lab
X
Xiaoye S. Li
Lawrence Berkeley National Lab
A
A. Buluç
Lawrence Berkeley National Lab
J
James Demmel
University of California, Berkeley