FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture

📅 2024-08-28
🏛️ IEEE Transactions on Circuits and Systems Part 1: Regular Papers
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Existing spiking neural network (SNN) hardware accelerators exploit only temporal sparsity in spike events while neglecting synaptic weight sparsity, limiting energy efficiency and hardware utilization. Method: This work proposes a software–hardware co-design approach for bilateral sparsity optimization—jointly exploiting both temporal and weight sparsity. It integrates gradient-aware rewiring pruning with SNN-customized Learned Step Quantization (LSQ), designs a bitmap-driven bilateral sparsity detector, and implements a reconfigurable spatial architecture with cross-layer pipelined spatiotemporal dataflow, overcoming constraints of prior FireFly-style overlay structures. Results: Evaluated on MNIST, DVS-Gesture, and CIFAR-10, the accelerator achieves 85–95% bilateral sparsity and 4-bit weight/activation quantization, delivering 10,047 / 3,683 / 2,327 FPS/W energy efficiency with <0.5% accuracy degradation. This is the first systematic exploration and efficient hardware exploitation of SNNs’ intrinsic bilateral sparsity, significantly enhancing both energy efficiency and hardware adaptability.

Technology Category

Application Category

📝 Abstract
Spiking Neural Networks (SNNs), with their brain-inspired structure using discrete spikes instead of continuous activations, are gaining attention for their potential of efficient processing on neuromorphic chips. While current SNN hardware accelerators often prioritize temporal spike sparsity, exploiting sparse synaptic weights offers significant untapped potential for even greater efficiency. To address this, we propose FireFly-S, a Sparse extension of the FireFly series. This co-optimized software-hardware design focusing on leveraging dual-side sparsity for acceleration. On the software side, we propose a novel algorithmic optimization framework that combines gradient rewiring for pruning and modified Learned Step Size Quantization (LSQ) tailored for SNNs, which achieves remarkable weight sparsity exceeding 85% and enables efficient 4-bit quantization with negligible accuracy loss. On the hardware side, we present an efficient dual-side sparsity detector employing a Bitmap-based sparse decoding logic to pinpoint the positions of non-zero weights and input spikes. The logic allows for the direct bypassing of redundant computations, thereby enhancing computational efficiency. Different from the overlay architecture adopted by previous FireFly series, we adopt a spatial architecture with inter-layer pipelining that can fully exploit the nature of Field-Programmable Gate Arrays (FPGAs). A spatial-temporal dataflow is also proposed to support such inter-layer pipelining and avoid long-term temporal dependencies. In experiments conducted on the MNIST, DVS-Gesture and CIFAR-10 datasets, the FireFly-S model achieves 85-95% sparsity with 4-bit quantization and the hardware accelerator effectively leverages the dual-side sparsity, delivering outstanding performance metrics of 10,047 FPS/W on MNIST, 3,683 FPS/W on DVS-Gesture, and 2,327 FPS/W on CIFAR-10.
Problem

Research questions and friction points this paper is trying to address.

Exploiting dual-side sparsity in SNNs for acceleration
Co-optimizing software-hardware for efficient 4-bit quantization
Enhancing computational efficiency with spatial-temporal dataflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-side sparsity optimization for SNN acceleration
Gradient rewiring and 4-bit quantization framework
Bitmap-based sparse decoding logic on FPGA
🔎 Similar Papers
No similar papers found.
Tenglong Li
Tenglong Li
Institute of Automation, Chinese Academy of Sciences
Hardware Architecture
J
Jindong Li
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China, and also with the Brain-inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
G
Guobin Shen
School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China, and also with the Brain-inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Dongcheng Zhao
Dongcheng Zhao
Beijing Institute of AI Safety and Governance
Spiking Neural NetworksEvent Based VisionBrain-inspired AILLM Safety
Q
Qian Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China, and also with the Brain-inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Y
Yi Zeng
Brain-inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China, and University of Chinese Academy of Sciences, Beijing 100049, China, and Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China