Efficient Dynamic Structured Sparse Training with Learned Shuffles

๐Ÿ“… 2025-10-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Structured sparse training enables GPU acceleration but suffers from limited expressivity due to fixed sparsity patterns (e.g., block, N:M, diagonal), resulting in substantial accuracy degradation compared to unstructured dynamic sparse training (DST). To address this, we propose Permutation-Augmented Dynamic Sparse Training (PA-DST), which jointly learns, per layer, a differentiable and optimizable permutation matrix alongside structured sparse weightsโ€”enabling dynamic rearrangement of weight locations to significantly enhance representational capacity. PA-DST is compatible with mainstream structured sparsity patterns and maintains high sparsity (90%โ€“95%). On ImageNet-1K and WikiText-103, it matches the accuracy of state-of-the-art unstructured DST methods (e.g., RigL, SET), while accelerating training by 1.21ร— and inference by up to 2.9ร—.

Technology Category

Application Category

๐Ÿ“ Abstract
Structured sparsity accelerates training and inference on modern GPUs, yet it still trails unstructured dynamic sparse training (DST) in accuracy. The shortfall stems from a loss of expressivity: whereas a dense layer can realize every possible mask obtained by choosing any $w$ active weights out of $n$, a fixed block or N:M layout explores only a subset of those possibilities. We propose to close this gap by learning, for each layer, a single permutation matrix jointly with the structured weight matrix. Applied to three canonical structures -- block, N:M, and diagonals -- we show that permutation-augmented DST (PA-DST) matches unstructured baselines (RigL, SET) at 90--95% sparsity on ImageNet-1K (ViT-B/16) and WikiText-103 (GPT-2), yet trains up to $1.21 imes$ and infers up to $2.9 imes$ faster. The results position structure + learned permutation as a sweet spot between accuracy and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Improving structured sparsity accuracy via learned permutations
Closing expressivity gap between structured and unstructured sparsity
Enhancing training efficiency while maintaining model accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned permutation matrices for each layer
Augmented structured sparsity with dynamic training
Combined block, N:M and diagonal structures
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Abhishek Tyagi
Department of Computer Science, University of Rochester, NY , USA
A
A. Iyer
The Institute of Optics, University of Rochester, NY , USA
L
Liam Young
The Institute of Optics, University of Rochester, NY , USA
W
William H. Renninger
The Institute of Optics, University of Rochester, NY , USA
Christopher Kanan
Christopher Kanan
University of Rochester
Artificial IntelligenceDeep LearningAGIMulti-Modal AICognitive Science
Y
Yuhao Zhu
Department of Computer Science, University of Rochester, NY , USA