Sparse by Rule: Probability-Based N:M Pruning for Spiking Neural Networks

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of high parameter/computational overhead in deploying Spiking Neural Networks (SNNs) on edge devices and the difficulty of existing sparsification methods to simultaneously ensure hardware efficiency and accuracy, this paper proposes SpikeNM—the first semi-structured N:M pruning framework tailored for SNNs. Our method introduces: (1) block-level learnable N:M sparsity constraints to balance hardware acceleration feasibility and structural flexibility; (2) an M-way base-logit parameterization coupled with differentiable top-k sampling for end-to-end training; and (3) eligibility-based distillation grounded in temporal credit accumulation to reduce variance in pruning probability estimation. Experiments demonstrate that, at 2:4 sparsity, SpikeNM maintains or even surpasses the accuracy of dense SNNs across mainstream benchmarks, while generating hardware-friendly sparse patterns—thereby significantly improving inference efficiency for edge deployment of SNNs.

Technology Category

Application Category

📝 Abstract
Brain-inspired Spiking neural networks (SNNs) promise energy-efficient intelligence via event-driven, sparse computation, but deeper architectures inflate parameters and computational cost, hindering their edge deployment. Recent progress in SNN pruning helps alleviate this burden, yet existing efforts fall into only two families: emph{unstructured} pruning, which attains high sparsity but is difficult to accelerate on general hardware, and emph{structured} pruning, which eases deployment but lack flexibility and often degrades accuracy at matched sparsity. In this work, we introduce extbf{SpikeNM}, the first SNN-oriented emph{semi-structured} (N{:}M) pruning framework that learns sparse SNNs emph{from scratch}, enforcing emph{at most (N)} non-zeros per (M)-weight block. To avoid the combinatorial space complexity (sum_{k=1}^{N}inom{M}{k}) growing exponentially with (M), SpikeNM adopts an (M)-way basis-logit parameterization with a differentiable top-(k) sampler, emph{linearizing} per-block complexity to (mathcal O(M)) and enabling more aggressive sparsification. Further inspired by neuroscience, we propose emph{eligibility-inspired distillation} (EID), which converts temporally accumulated credits into block-wise soft targets to align mask probabilities with spiking dynamics, reducing sampling variance and stabilizing search under high sparsity. Experiments show that at (2{:}4) sparsity, SpikeNM maintains and even with gains across main-stream datasets, while yielding hardware-amenable patterns that complement intrinsic spike sparsity.
Problem

Research questions and friction points this paper is trying to address.

SNN pruning faces unstructured vs structured trade-offs
Existing methods lack hardware efficiency or accuracy
Semi-structured N:M pruning needs scalable optimization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-structured N:M pruning for SNNs
Linear complexity via basis-logit parameterization
Eligibility-inspired distillation aligns mask probabilities
🔎 Similar Papers
No similar papers found.
S
Shuhan Ye
Nanyang Technological University
Y
Yi Yu
Nanyang Technological University
Q
Qixin Zhang
Nanyang Technological University
C
Chenqi Kong
Nanyang Technological University
Qiangqiang Wu
Qiangqiang Wu
Postdoc, City University of Hong Kong, Princeton University
Computer VisionSelf-Supervised Temporal Representation LearningHealthcare AI
X
Xudong Jiang
Nanyang Technological University
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining