🤖 AI Summary
To address the challenges of high parameter/computational overhead in deploying Spiking Neural Networks (SNNs) on edge devices and the difficulty of existing sparsification methods to simultaneously ensure hardware efficiency and accuracy, this paper proposes SpikeNM—the first semi-structured N:M pruning framework tailored for SNNs. Our method introduces: (1) block-level learnable N:M sparsity constraints to balance hardware acceleration feasibility and structural flexibility; (2) an M-way base-logit parameterization coupled with differentiable top-k sampling for end-to-end training; and (3) eligibility-based distillation grounded in temporal credit accumulation to reduce variance in pruning probability estimation. Experiments demonstrate that, at 2:4 sparsity, SpikeNM maintains or even surpasses the accuracy of dense SNNs across mainstream benchmarks, while generating hardware-friendly sparse patterns—thereby significantly improving inference efficiency for edge deployment of SNNs.
📝 Abstract
Brain-inspired Spiking neural networks (SNNs) promise energy-efficient intelligence via event-driven, sparse computation, but deeper architectures inflate parameters and computational cost, hindering their edge deployment. Recent progress in SNN pruning helps alleviate this burden, yet existing efforts fall into only two families: emph{unstructured} pruning, which attains high sparsity but is difficult to accelerate on general hardware, and emph{structured} pruning, which eases deployment but lack flexibility and often degrades accuracy at matched sparsity. In this work, we introduce extbf{SpikeNM}, the first SNN-oriented emph{semi-structured} (N{:}M) pruning framework that learns sparse SNNs emph{from scratch}, enforcing emph{at most (N)} non-zeros per (M)-weight block. To avoid the combinatorial space complexity (sum_{k=1}^{N}inom{M}{k}) growing exponentially with (M), SpikeNM adopts an (M)-way basis-logit parameterization with a differentiable top-(k) sampler, emph{linearizing} per-block complexity to (mathcal O(M)) and enabling more aggressive sparsification. Further inspired by neuroscience, we propose emph{eligibility-inspired distillation} (EID), which converts temporally accumulated credits into block-wise soft targets to align mask probabilities with spiking dynamics, reducing sampling variance and stabilizing search under high sparsity. Experiments show that at (2{:}4) sparsity, SpikeNM maintains and even with gains across main-stream datasets, while yielding hardware-amenable patterns that complement intrinsic spike sparsity.