🤖 AI Summary
To address the challenge of simultaneously achieving high accuracy and low memory overhead in spiking neural network (SNN) training, this paper proposes a spatiotemporal decoupled learning framework. Spatially, it employs subnetwork divide-and-conquer and dynamic layer-subset selection; temporally, it realizes backward pass decoupling and online temporal learning, augmented by a lightweight auxiliary network for knowledge distillation. This is the first framework to achieve *dual* spatial and temporal decoupling—preserving inter-layer collaboration while respecting strict memory constraints. Experiments across seven static and event-camera datasets show accuracy on par with full backpropagation through time (BPTT); on ImageNet, GPU memory consumption is reduced to 25% of standard BPTT, significantly improving training efficiency and model deployability.
📝 Abstract
Spiking neural networks (SNNs) have gained significant attention for their potential to enable energy-efficient artificial intelligence. However, effective and efficient training of SNNs remains an unresolved challenge. While backpropagation through time (BPTT) achieves high accuracy, it incurs substantial memory overhead. In contrast, biologically plausible local learning methods are more memory-efficient but struggle to match the accuracy of BPTT. To bridge this gap, we propose spatio-temporal decouple learning (STDL), a novel training framework that decouples the spatial and temporal dependencies to achieve both high accuracy and training efficiency for SNNs. Specifically, to achieve spatial decoupling, STDL partitions the network into smaller subnetworks, each of which is trained independently using an auxiliary network. To address the decreased synergy among subnetworks resulting from spatial decoupling, STDL constructs each subnetwork's auxiliary network by selecting the largest subset of layers from its subsequent network layers under a memory constraint. Furthermore, STDL decouples dependencies across time steps to enable efficient online learning. Extensive evaluations on seven static and event-based vision datasets demonstrate that STDL consistently outperforms local learning methods and achieves comparable accuracy to the BPTT method with considerably reduced GPU memory cost. Notably, STDL achieves 4x reduced GPU memory than BPTT on the ImageNet dataset. Therefore, this work opens up a promising avenue for memory-efficient SNN training. Code is available at https://github.com/ChenxiangMA/STDL.