🤖 AI Summary
To address the limitations of backpropagation through time (BPTT) and surrogate gradient methods for spiking neural network (SNN) training—including suboptimal accuracy, high temporal computational overhead, and excessive memory consumption—this paper proposes an enhanced self-distillation framework. Methodologically, it introduces: (1) a lightweight artificial neural network (ANN) branch that takes intermediate-layer spike rates of the SNN as input, enabling cross-modal knowledge transfer; (2) the first decomposition of teacher signals into reliable and unreliable components, where only the reliable component guides SNN optimization to improve convergence stability; and (3) the integration of rate-based backpropagation with self-distillation, eliminating temporal unrolling and gradient truncation. Evaluated on CIFAR-10/100, CIFAR10-DVS, and ImageNet, the method significantly reduces training complexity while surpassing state-of-the-art SNN training approaches in accuracy, validating the efficacy of this efficient co-optimization paradigm.
📝 Abstract
Spiking Neural Networks (SNNs) exhibit exceptional energy efficiency on neuromorphic hardware due to their sparse activation patterns. However, conventional training methods based on surrogate gradients and Backpropagation Through Time (BPTT) not only lag behind Artificial Neural Networks (ANNs) in performance, but also incur significant computational and memory overheads that grow linearly with the temporal dimension. To enable high-performance SNN training under limited computational resources, we propose an enhanced self-distillation framework, jointly optimized with rate-based backpropagation. Specifically, the firing rates of intermediate SNN layers are projected onto lightweight ANN branches, and high-quality knowledge generated by the model itself is used to optimize substructures through the ANN pathways. Unlike traditional self-distillation paradigms, we observe that low-quality self-generated knowledge may hinder convergence. To address this, we decouple the teacher signal into reliable and unreliable components, ensuring that only reliable knowledge is used to guide the optimization of the model. Extensive experiments on CIFAR-10, CIFAR-100, CIFAR10-DVS, and ImageNet demonstrate that our method reduces training complexity while achieving high-performance SNN training. Our code is available at https://github.com/Intelli-Chip-Lab/enhanced-self-distillation-framework-for-snn.