sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

📅 2025-01-05

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the low GPU parallel efficiency and severe fill-in in Cholesky factorization of arrowhead- and band-structured sparse symmetric matrices in scientific computing, this paper proposes the first GPU-accelerated decomposition framework specifically designed for arrowhead structures. Methodologically, it introduces a structure-aware static task scheduler and a dynamic block-size trade-off strategy; it also pioneers a dependency-decoupling mechanism that integrates left-looking Cholesky with tree-based reduction, jointly optimizing fill-in suppression and fine-grained parallelism. Experiments on an NVIDIA A100 GPU demonstrate speedups of up to 8.41×, 9.34×, 5.07×, and 11.08× over CHOLMOD, SymPACK, MUMPS, and PARDISO, respectively, and a 5× speedup over a 32-core AMD EPYC CPU. The core contributions are a structure-driven paradigm shift in sparse Cholesky factorization and an efficient GPU execution model tailored to arrowhead matrix sparsity patterns.

Technology Category

Application Category

📝 Abstract

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle challenging arrowhead sparse matrices with variable bandwidths, common in scientific and engineering fields. It minimizes fill-in during Cholesky factorization using permutation techniques and employs a static scheduler to manage tasks on shared-memory systems with GPU accelerators. sTiles balances tile size and parallelism, where larger tiles enhance algorithmic intensity but increase floating-point operations and memory usage, while parallelism is constrained by the arrowhead structure. To expose more parallelism, a left-looking Cholesky variant breaks sequential dependencies in trailing submatrix updates via tree reductions. Evaluations show sTiles achieves speedups of up to 8.41X, 9.34X, 5.07X, and 11.08X compared to CHOLMOD, SymPACK, MUMPS, and PARDISO, respectively, and a 5X speedup compared to a 32-core AMD EPYC CPU on an NVIDIA A100 GPU. Our generic software framework imports well-established concepts from dense matrix computations but they all require customizations in their deployments on hybrid architectures to best handle factorizations of sparse matrices with arrowhead structures.

Problem

Research questions and friction points this paper is trying to address.

Sparse Matrix Decomposition

GPU Acceleration

Arrowhead and Banded Matrices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Matrix Decomposition

GPU Acceleration

Parallel Computing Optimization

🔎 Similar Papers

A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization