🤖 AI Summary
Spiking Neural Networks (SNNs) face a fundamental spatiotemporal credit assignment challenge: backpropagation through time (BPTT) violates biological plausibility due to non-local weight updates and incurs prohibitive memory overhead; existing local learning rules—while temporally local (e.g., via eligibility traces)—rely on auxiliary weight matrices for spatial credit assignment, limiting scalability and edge deployment. This paper introduces the first forward-only, fully local learning framework that jointly resolves spatiotemporal credit assignment by synergistically integrating hierarchical contrastive loss with trace-based eligibility propagation—eliminating the need for auxiliary matrices. The method drastically reduces memory footprint, enables training of deep SNNs (e.g., VGG-9), achieves state-of-the-art accuracy on NMNIST, SHD, and DVS-GESTURE benchmarks, and demonstrates feasibility for keyword spotting and resource-constrained edge inference.
📝 Abstract
Spiking Neural Networks (SNNs) provide an efficient framework for processing dynamic spatio-temporal signals and for investigating the learning principles underlying biological neural systems. A key challenge in training SNNs is to solve both spatial and temporal credit assignment. The dominant approach for training SNNs is Backpropagation Through Time (BPTT) with surrogate gradients. However, BPTT is in stark contrast with the spatial and temporal locality observed in biological neural systems and leads to high computational and memory demands, limiting efficient training strategies and on-device learning. Although existing local learning rules achieve local temporal credit assignment by leveraging eligibility traces, they fail to address the spatial credit assignment without resorting to auxiliary layer-wise matrices, which increase memory overhead and hinder scalability, especially on embedded devices. In this work, we propose Traces Propagation (TP), a forward-only, memory-efficient, scalable, and fully local learning rule that combines eligibility traces with a layer-wise contrastive loss without requiring auxiliary layer-wise matrices. TP outperforms other fully local learning rules on NMNIST and SHD datasets. On more complex datasets such as DVS-GESTURE and DVS-CIFAR10, TP showcases competitive performance and scales effectively to deeper SNN architectures such as VGG-9, while providing favorable memory scaling compared to prior fully local scalable rules, for datasets with a significant number of classes. Finally, we show that TP is well suited for practical fine-tuning tasks, such as keyword spotting on the Google Speech Commands dataset, thus paving the way for efficient learning at the edge.