🤖 AI Summary
This work addresses the inference latency and energy-efficiency bottlenecks in spiking neural networks (SNNs) caused by sequential membrane potential updates. To overcome these limitations, the authors propose an algorithm-hardware co-optimization approach that introduces a linearly decaying leaky integrate-and-fire (LD-LIF) neuron model and, for the first time, enables parallel in-situ membrane potential updates within an SRAM-based compute-in-memory (CIM) architecture. This innovation reduces the state-update complexity from O(N) to O(1). With only approximately 1% accuracy degradation, the proposed system achieves 1.1×–16.7× lower energy consumption and 15.9×–69× higher energy efficiency, thereby significantly alleviating a key bottleneck in SNN acceleration.
📝 Abstract
Spiking Neural Networks (SNNs) have emerged as a biologically inspired alternative to conventional deep networks, offering event-driven and energy-efficient computation. However, their throughput remains constrained by the serial update of neuron membrane states. While many hardware accelerators and Compute-in-Memory (CIM) architectures efficiently parallelize the synaptic operation (W x I) achieving O(1) complexity for matrix-vector multiplication, the subsequent state update step still requires O(N) time to refresh all neuron membrane potentials. This mismatch makes state update the dominant latency and energy bottleneck in SNN inference. To address this challenge, we propose an SRAM-based CIM for SNN with Linear Decay Leaky Integrate-and-Fire (LD-LIF) Neuron that co-optimizes algorithm and hardware. At the algorithmic level, we replace the conventional exponential membrane decay with a linear decay approximation, converting costly multiplications into simple additions while accuracy drops only around 1%. At the architectural level, we introduce an in-memory parallel update scheme that performs in-place decay directly within the SRAM array, eliminating the need for global sequential updates. Evaluated on benchmark SNN workloads, the proposed method achieves a 1.1 x to 16.7 x reduction of SOP energy consumption, while providing 15.9 x to 69 x more energy efficiency, with negligible accuracy loss relative to original decay models. This work highlights that beyond accelerating the (W x I) computation, optimizing state-update dynamics within CIM architectures is essential for scalable, low-power, and real-time neuromorphic processing.