🤖 AI Summary
Spiking neural networks (SNNs) face three key challenges in neuromorphic computing: lack of general-purpose gradient-based training, inadequate modeling of spike latency, and high memory overhead of sparse event representations on AI accelerators. This paper introduces the first hardware-aware, differentiable event queue framework, embedding automatic differentiation directly into spike event scheduling—enabling unified support for both latency-aware modeling and sparse computation. We design four queue architectures—tree-based, FIFO, circular buffer, and sort-based—and integrate selective spike dropping to achieve low-memory, end-to-end differentiable spike simulation across CPUs, GPUs, TPUs, and LPUs. Experiments reveal that queue architecture critically impacts performance: GPUs excel at small-scale simulations, while TPUs favor sort-based designs. The framework enables flexible accuracy–efficiency trade-offs, establishing a new paradigm for efficient SNN training on heterogeneous accelerators.
📝 Abstract
Spiking neural networks (SNNs), central to computational neuroscience and neuromorphic machine learning (ML), require efficient simulation and gradient-based training. While AI accelerators offer promising speedups, gradient-based SNNs typically implement sparse spike events using dense, memory-heavy data-structures. Existing exact gradient methods lack generality, and current simulators often omit or inefficiently handle delayed spikes. We address this by deriving gradient computation through spike event queues, including delays, and implementing memory-efficient, gradient-enabled event queue structures. These are benchmarked across CPU, GPU, TPU, and LPU platforms. We find that queue design strongly shapes performance. CPUs, as expected, perform well with traditional tree-based or FIFO implementations, while GPUs excel with ring buffers for smaller simulations, yet under higher memory pressure prefer more sparse data-structures. TPUs seem to favor an implementation based on sorting intrinsics. Selective spike dropping provides a simple performance-accuracy trade-off, which could be enhanced by future autograd frameworks adapting diverging primal/tangent data-structures.