ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
This work addresses the limitation of existing spiking neural network (SNN) accelerators, which rely on inter-layer synchronization mechanisms that hinder elastic inference and incur high response latency for critical inputs. To overcome this, the authors propose a near-SRAM dataflow architecture featuring the first spine/token-level fine-grained pipeline. By integrating a bundled address-event representation protocol with mini-batch spike-based Gustavson products, the design eliminates inter-layer synchronization constraints and enables immediate output forwarding under event-driven execution. This approach fully exploits the inherent sparsity and asynchronous nature of SNNs. At comparable accuracy, the proposed accelerator achieves 3.4× higher speed and 13.6× better energy efficiency than the state-of-the-art quantized ANN accelerator ANT, and outperforms the leading SNN accelerator PAICORE by 2.9× in speed and 22.1× in energy efficiency.
📝 Abstract
Spiking neural networks (SNNs) exploit event-driven and addition-only computation to substantially improve efficiency for intelligent computation. A key temporal property of SNNs, elastic inference, allows outputs to emerge progressively, enabling responses to salient inputs much earlier than full evaluation. However, existing SNN-specific accelerators cannot capitalize on this property. Layer-by-layer designs emit outputs only after all layers are complete, while time-step-by-time-step designs rely on coarse-grained, layer-wise pipelines that require synchronizing all spines/tokens within a layer. This barrier prevents results from being forwarded immediately, delaying the earliest possible response and forfeiting the benefits of elastic inference. To address these challenges, we propose ELSA, a near-SRAM dataflow architecture that realizes true elastic inference through a fine-grained spine/token-wise pipeline and hardware optimizations tailored to SNNs. ELSA forwards each spine/token immediately upon production, forming a continuous streaming pipeline that substantially reduces the latency to the first response. To enhance this lightweight execution, ELSA introduces a bundled address event representation protocol to lower communication traffic of network-on-chip (NoC), and leverages mini-batch spiking Gustavson-product to cut memory access and exploit inherent sparsity. Combined with mapping and scheduling optimizations, ELSA achieves efficient, event-driven computation without compromising accuracy. Experiments show that SNNs can outperform quantized artificial neural networks (QANNs) while maintaining on-par accuracy. For a 4-bit ResNet-50, ELSA achieves 3.4$\times$ speedup and 13.6$\times$ higher energy efficiency over the SOTA QANN accelerator (ANT), and 2.9$\times$ speedup and 22.1$\times$ energy efficiency gains over the SOTA SNN accelerator (PAICORE).
Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Networks
Elastic Inference
Neuromorphic Computing
Hardware Acceleration
Latency Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

elastic inference
spiking neural networks
fine-grained pipeline
address event representation
Gustavson-product
🔎 Similar Papers
No similar papers found.
K
Kang You
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
C
Chen Nie
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
L
Lee Jun Yan
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
Z
Ziling Wei
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
C
Cheng Zou
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
Z
Zekai Xu
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
Yu Feng
Yu Feng
Shanghai Jiao Tong University
Computer Architecture
H
Honglan Jiang
Institute of Chip Design and EDA, School of Integrated Circuits, Shanghai Jiao Tong University, Shanghai, CN
Zhezhi He
Zhezhi He
Associate Professor, Shanghai Jiao Tong University
Intelligent ComputingNeuromorphic ComputingComputer ArchitectureEDA