FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture

📅 2024-08-28

🏛️ IEEE Transactions on Circuits and Systems Part 1: Regular Papers

📈 Citations: 4

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing spiking neural network (SNN) hardware accelerators exploit only temporal sparsity in spike events while neglecting synaptic weight sparsity, limiting energy efficiency and hardware utilization. Method: This work proposes a software–hardware co-design approach for bilateral sparsity optimization—jointly exploiting both temporal and weight sparsity. It integrates gradient-aware rewiring pruning with SNN-customized Learned Step Quantization (LSQ), designs a bitmap-driven bilateral sparsity detector, and implements a reconfigurable spatial architecture with cross-layer pipelined spatiotemporal dataflow, overcoming constraints of prior FireFly-style overlay structures. Results: Evaluated on MNIST, DVS-Gesture, and CIFAR-10, the accelerator achieves 85–95% bilateral sparsity and 4-bit weight/activation quantization, delivering 10,047 / 3,683 / 2,327 FPS/W energy efficiency with <0.5% accuracy degradation. This is the first systematic exploration and efficient hardware exploitation of SNNs’ intrinsic bilateral sparsity, significantly enhancing both energy efficiency and hardware adaptability.

Technology Category

Application Category

📝 Abstract

Spiking Neural Networks (SNNs), with their brain-inspired structure using discrete spikes instead of continuous activations, are gaining attention for their potential of efficient processing on neuromorphic chips. While current SNN hardware accelerators often prioritize temporal spike sparsity, exploiting sparse synaptic weights offers significant untapped potential for even greater efficiency. To address this, we propose FireFly-S, a Sparse extension of the FireFly series. This co-optimized software-hardware design focusing on leveraging dual-side sparsity for acceleration. On the software side, we propose a novel algorithmic optimization framework that combines gradient rewiring for pruning and modified Learned Step Size Quantization (LSQ) tailored for SNNs, which achieves remarkable weight sparsity exceeding 85% and enables efficient 4-bit quantization with negligible accuracy loss. On the hardware side, we present an efficient dual-side sparsity detector employing a Bitmap-based sparse decoding logic to pinpoint the positions of non-zero weights and input spikes. The logic allows for the direct bypassing of redundant computations, thereby enhancing computational efficiency. Different from the overlay architecture adopted by previous FireFly series, we adopt a spatial architecture with inter-layer pipelining that can fully exploit the nature of Field-Programmable Gate Arrays (FPGAs). A spatial-temporal dataflow is also proposed to support such inter-layer pipelining and avoid long-term temporal dependencies. In experiments conducted on the MNIST, DVS-Gesture and CIFAR-10 datasets, the FireFly-S model achieves 85-95% sparsity with 4-bit quantization and the hardware accelerator effectively leverages the dual-side sparsity, delivering outstanding performance metrics of 10,047 FPS/W on MNIST, 3,683 FPS/W on DVS-Gesture, and 2,327 FPS/W on CIFAR-10.

Problem

Research questions and friction points this paper is trying to address.

Exploiting dual-side sparsity in SNNs for acceleration

Co-optimizing software-hardware for efficient 4-bit quantization

Enhancing computational efficiency with spatial-temporal dataflow

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-side sparsity optimization for SNN acceleration

Gradient rewiring and 4-bit quantization framework

Bitmap-based sparse decoding logic on FPGA

🔎 Similar Papers

No similar papers found.