Efficient Dynamic Structured Sparse Training with Learned Shuffles

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Structured sparse training enables GPU acceleration but suffers from limited expressivity due to fixed sparsity patterns (e.g., block, N:M, diagonal), resulting in substantial accuracy degradation compared to unstructured dynamic sparse training (DST). To address this, we propose Permutation-Augmented Dynamic Sparse Training (PA-DST), which jointly learns, per layer, a differentiable and optimizable permutation matrix alongside structured sparse weights—enabling dynamic rearrangement of weight locations to significantly enhance representational capacity. PA-DST is compatible with mainstream structured sparsity patterns and maintains high sparsity (90%–95%). On ImageNet-1K and WikiText-103, it matches the accuracy of state-of-the-art unstructured DST methods (e.g., RigL, SET), while accelerating training by 1.21× and inference by up to 2.9×.

Technology Category

Application Category

📝 Abstract

Structured sparsity accelerates training and inference on modern GPUs, yet it still trails unstructured dynamic sparse training (DST) in accuracy. The shortfall stems from a loss of expressivity: whereas a dense layer can realize every possible mask obtained by choosing any $w$ active weights out of $n$, a fixed block or N:M layout explores only a subset of those possibilities. We propose to close this gap by learning, for each layer, a single permutation matrix jointly with the structured weight matrix. Applied to three canonical structures -- block, N:M, and diagonals -- we show that permutation-augmented DST (PA-DST) matches unstructured baselines (RigL, SET) at 90--95% sparsity on ImageNet-1K (ViT-B/16) and WikiText-103 (GPT-2), yet trains up to $1.21 imes$ and infers up to $2.9 imes$ faster. The results position structure + learned permutation as a sweet spot between accuracy and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Improving structured sparsity accuracy via learned permutations

Closing expressivity gap between structured and unstructured sparsity

Enhancing training efficiency while maintaining model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned permutation matrices for each layer

Augmented structured sparsity with dynamic training

Combined block, N:M and diagonal structures

🔎 Similar Papers

Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks