🤖 AI Summary
Existing spiking neural network (SNN) hardware accelerators exploit only temporal sparsity in spike events while neglecting synaptic weight sparsity, limiting energy efficiency and hardware utilization.
Method: This work proposes a software–hardware co-design approach for bilateral sparsity optimization—jointly exploiting both temporal and weight sparsity. It integrates gradient-aware rewiring pruning with SNN-customized Learned Step Quantization (LSQ), designs a bitmap-driven bilateral sparsity detector, and implements a reconfigurable spatial architecture with cross-layer pipelined spatiotemporal dataflow, overcoming constraints of prior FireFly-style overlay structures.
Results: Evaluated on MNIST, DVS-Gesture, and CIFAR-10, the accelerator achieves 85–95% bilateral sparsity and 4-bit weight/activation quantization, delivering 10,047 / 3,683 / 2,327 FPS/W energy efficiency with <0.5% accuracy degradation. This is the first systematic exploration and efficient hardware exploitation of SNNs’ intrinsic bilateral sparsity, significantly enhancing both energy efficiency and hardware adaptability.
📝 Abstract
Spiking Neural Networks (SNNs), with their brain-inspired structure using discrete spikes instead of continuous activations, are gaining attention for their potential of efficient processing on neuromorphic chips. While current SNN hardware accelerators often prioritize temporal spike sparsity, exploiting sparse synaptic weights offers significant untapped potential for even greater efficiency. To address this, we propose FireFly-S, a Sparse extension of the FireFly series. This co-optimized software-hardware design focusing on leveraging dual-side sparsity for acceleration. On the software side, we propose a novel algorithmic optimization framework that combines gradient rewiring for pruning and modified Learned Step Size Quantization (LSQ) tailored for SNNs, which achieves remarkable weight sparsity exceeding 85% and enables efficient 4-bit quantization with negligible accuracy loss. On the hardware side, we present an efficient dual-side sparsity detector employing a Bitmap-based sparse decoding logic to pinpoint the positions of non-zero weights and input spikes. The logic allows for the direct bypassing of redundant computations, thereby enhancing computational efficiency. Different from the overlay architecture adopted by previous FireFly series, we adopt a spatial architecture with inter-layer pipelining that can fully exploit the nature of Field-Programmable Gate Arrays (FPGAs). A spatial-temporal dataflow is also proposed to support such inter-layer pipelining and avoid long-term temporal dependencies. In experiments conducted on the MNIST, DVS-Gesture and CIFAR-10 datasets, the FireFly-S model achieves 85-95% sparsity with 4-bit quantization and the hardware accelerator effectively leverages the dual-side sparsity, delivering outstanding performance metrics of 10,047 FPS/W on MNIST, 3,683 FPS/W on DVS-Gesture, and 2,327 FPS/W on CIFAR-10.