A Structure-Aware Irregular Blocking Method for Sparse LU Factorization

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In sparse LU factorization, symbolic analysis yields nonzero patterns concentrated along the diagonal and bottom-right region, causing severe load imbalance under regular 2D blocking; moreover, existing matrix features inadequately support adaptive blocking. To address this, we propose a structure-aware irregular blocking method: we introduce a novel local nonzero density metric based on diagonal blocks, and integrate fine-grained and coarse-grained blocking strategies to dynamically adapt to both dense and sparse subregions. Furthermore, we model task dependencies via a dependency tree and optimize GPU parallelism to achieve balanced workloads across hierarchical levels and within each level. On a single NVIDIA A100 GPU, our method achieves 1.50× and 3.32× speedup over PanguLU and SuperLU_DIST, respectively; with four GPUs, it attains 1.40× and 3.84× speedup, demonstrating significantly improved parallel scalability and efficiency.

Technology Category

Application Category

📝 Abstract
In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to workload imbalance across blocks. Besides, existing matrix features fail to guide us effectively in blocking. In this paper, we propose a structure-aware irregular blocking method for numerical factorization. A novel diagonal block-based feature is introduced to effectively characterize the local nonzero distribution of sparse matrices. Based on this, we further propose an irregular blocking method that adjusts block sizes according to the local distribution of nonzeros. The strategy utilizes fine-grained blocks in dense regions and coarse-grained blocks in sparse regions, adequately balancing the nonzeros of blocks both within the same level and across levels in the dependency tree. Experiments demonstrate that, on a single NVIDIA A100 GPU, our proposed irregular blocking method achieves average speedups of 1.50x and 3.32x over PanguLU and the latest SuperLU_DIST, respectively. In addition, it achieves speedups of 1.40x and 3.84x over PanguLU and SuperLU_DIST on 4 NVIDIA A100 GPUs.
Problem

Research questions and friction points this paper is trying to address.

Addresses workload imbalance in sparse LU factorization blocking
Introduces diagonal block-based feature for local nonzero distribution
Proposes irregular blocking method adjusting block sizes by density
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure-aware irregular blocking for sparse LU factorization
Diagonal block-based feature characterizes local nonzero distribution
Adjusts block sizes using fine-grained and coarse-grained blocks
🔎 Similar Papers
No similar papers found.
Zhen Hu
Zhen Hu
University of Michigan-Dearborn
Design Under UncertaintyUncertainty QuantificationStructural Health MonitoringMachine Learning
D
Dongliang Xiong
College of Integrated Circuits, Zhejiang University, 311200, China
K
Kai Huang
College of Integrated Circuits, Zhejiang University, 311200, China
C
Changjun Wu
College of Integrated Circuits, Zhejiang University, 311200, China
X
Xiaowen Jiang
College of Integrated Circuits, Zhejiang University, 311200, China