EventQueues: Autodifferentiable spike event queues for brain simulation on AI accelerators

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

Spiking neural networks (SNNs) face three key challenges in neuromorphic computing: lack of general-purpose gradient-based training, inadequate modeling of spike latency, and high memory overhead of sparse event representations on AI accelerators. This paper introduces the first hardware-aware, differentiable event queue framework, embedding automatic differentiation directly into spike event scheduling—enabling unified support for both latency-aware modeling and sparse computation. We design four queue architectures—tree-based, FIFO, circular buffer, and sort-based—and integrate selective spike dropping to achieve low-memory, end-to-end differentiable spike simulation across CPUs, GPUs, TPUs, and LPUs. Experiments reveal that queue architecture critically impacts performance: GPUs excel at small-scale simulations, while TPUs favor sort-based designs. The framework enables flexible accuracy–efficiency trade-offs, establishing a new paradigm for efficient SNN training on heterogeneous accelerators.

Technology Category

Application Category

📝 Abstract

Spiking neural networks (SNNs), central to computational neuroscience and neuromorphic machine learning (ML), require efficient simulation and gradient-based training. While AI accelerators offer promising speedups, gradient-based SNNs typically implement sparse spike events using dense, memory-heavy data-structures. Existing exact gradient methods lack generality, and current simulators often omit or inefficiently handle delayed spikes. We address this by deriving gradient computation through spike event queues, including delays, and implementing memory-efficient, gradient-enabled event queue structures. These are benchmarked across CPU, GPU, TPU, and LPU platforms. We find that queue design strongly shapes performance. CPUs, as expected, perform well with traditional tree-based or FIFO implementations, while GPUs excel with ring buffers for smaller simulations, yet under higher memory pressure prefer more sparse data-structures. TPUs seem to favor an implementation based on sorting intrinsics. Selective spike dropping provides a simple performance-accuracy trade-off, which could be enhanced by future autograd frameworks adapting diverging primal/tangent data-structures.

Problem

Research questions and friction points this paper is trying to address.

Efficient simulation and gradient-based training of spiking neural networks

Memory-heavy data-structures for sparse spike events on AI accelerators

Inefficient handling or omission of delayed spikes in existing simulators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autodifferentiable spike event queues for gradients

Memory-efficient queue structures across AI accelerators

Platform-specific queue designs optimizing simulation performance

🔎 Similar Papers

No similar papers found.