🤖 AI Summary
To address the challenges of maintaining contextual information in spiking neural networks (SNNs) for long-sequence tasks—while simultaneously meeting stringent hardware energy-efficiency and memory-budget constraints—this paper proposes an algorithm-hardware co-design framework. Inspired by cortical fast-slow memory mechanisms, the core innovation is a dual-path architecture: an explicit slow-memory pathway ensures stable long-term state retention and event-driven sparsity, while a fast pathway handles transient perception. The design integrates low-dimensional state modeling, heterogeneous sparse dataflow scheduling, and near-memory computing hardware to enable efficient algorithm-hardware synergy. Experiments demonstrate state-of-the-art accuracy on long-sequence benchmarks, with 40–60% fewer parameters, 4.1× higher hardware throughput, and 5.3× improved energy efficiency compared to prior approaches.
📝 Abstract
Spiking neural networks excel at event-driven sensing yet maintaining task-relevant context over long timescales. However building these networks in hardware respecting both tight energy and memory budgets, remains a core challenge in the field. We address this challenge through novel algorithm-hardware co-design effort. At the algorithm level, inspired by the cortical fast-slow organization in the brain, we introduce a neural network with an explicit slow memory pathway that, combined with fast spiking activity, enables a dual memory pathway (DMP) architecture in which each layer maintains a compact low-dimensional state that summarizes recent activity and modulates spiking dynamics. This explicit memory stabilizes learning while preserving event-driven sparsity, achieving competitive accuracy on long-sequence benchmarks with 40-60% fewer parameters than equivalent state-of-the-art spiking neural networks. At the hardware level, we introduce a near-memory-compute architecture that fully leverages the advantages of the DMP architecture by retaining its compact shared state while optimizing dataflow, across heterogeneous sparse-spike and dense-memory pathways. We show experimental results that demonstrate more than a 4x increase in throughput and over a 5x improvement in energy efficiency compared with state-of-the-art implementations. Together, these contributions demonstrate that biological principles can guide functional abstractions that are both algorithmically effective and hardware-efficient, establishing a scalable co-design paradigm for real-time neuromorphic computation and learning.