🤖 AI Summary
To address the prohibitively high training energy consumption of Spiking Transformers, this paper proposes a spatiotemporal co-optimized energy-efficient training architecture. The method jointly models dynamic spatiotemporal sparsity for the first time in spiking Transformer training: it designs gradient-sensitivity-based dynamic sparse masks, enables event-driven spatiotemporal sparse computation during forward propagation, applies sparse gradient updates in backpropagation, and integrates hardware-friendly low-precision quantization. Crucially, the approach preserves full-precision model accuracy while achieving up to 72% reduction in training energy consumption across multiple benchmark tasks. This substantial improvement in energy efficiency provides a scalable and practical solution for large-scale spiking neural network training.
📝 Abstract
(1) Pengcheng Laboratory, (2) Southern University of Science and Technology, (3) Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, (4) University of Chinese Academy of Sciences