ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the limitation of existing spiking neural network (SNN) accelerators, which rely on inter-layer synchronization mechanisms that hinder elastic inference and incur high response latency for critical inputs. To overcome this, the authors propose a near-SRAM dataflow architecture featuring the first spine/token-level fine-grained pipeline. By integrating a bundled address-event representation protocol with mini-batch spike-based Gustavson products, the design eliminates inter-layer synchronization constraints and enables immediate output forwarding under event-driven execution. This approach fully exploits the inherent sparsity and asynchronous nature of SNNs. At comparable accuracy, the proposed accelerator achieves 3.4× higher speed and 13.6× better energy efficiency than the state-of-the-art quantized ANN accelerator ANT, and outperforms the leading SNN accelerator PAICORE by 2.9× in speed and 22.1× in energy efficiency.

📝 Abstract

Spiking neural networks (SNNs) exploit event-driven and addition-only computation to substantially improve efficiency for intelligent computation. A key temporal property of SNNs, elastic inference, allows outputs to emerge progressively, enabling responses to salient inputs much earlier than full evaluation. However, existing SNN-specific accelerators cannot capitalize on this property. Layer-by-layer designs emit outputs only after all layers are complete, while time-step-by-time-step designs rely on coarse-grained, layer-wise pipelines that require synchronizing all spines/tokens within a layer. This barrier prevents results from being forwarded immediately, delaying the earliest possible response and forfeiting the benefits of elastic inference. To address these challenges, we propose ELSA, a near-SRAM dataflow architecture that realizes true elastic inference through a fine-grained spine/token-wise pipeline and hardware optimizations tailored to SNNs. ELSA forwards each spine/token immediately upon production, forming a continuous streaming pipeline that substantially reduces the latency to the first response. To enhance this lightweight execution, ELSA introduces a bundled address event representation protocol to lower communication traffic of network-on-chip (NoC), and leverages mini-batch spiking Gustavson-product to cut memory access and exploit inherent sparsity. Combined with mapping and scheduling optimizations, ELSA achieves efficient, event-driven computation without compromising accuracy. Experiments show that SNNs can outperform quantized artificial neural networks (QANNs) while maintaining on-par accuracy. For a 4-bit ResNet-50, ELSA achieves 3.4$\times$ speedup and 13.6$\times$ higher energy efficiency over the SOTA QANN accelerator (ANT), and 2.9$\times$ speedup and 22.1$\times$ energy efficiency gains over the SOTA SNN accelerator (PAICORE).

Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Networks

Elastic Inference

Neuromorphic Computing

Hardware Acceleration

Latency Reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

elastic inference

spiking neural networks

fine-grained pipeline