E2ATST: A Temporal-Spatial Optimized Energy-Efficient Architecture for Training Spiking Transformer

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

To address the prohibitively high training energy consumption of Spiking Transformers, this paper proposes a spatiotemporal co-optimized energy-efficient training architecture. The method jointly models dynamic spatiotemporal sparsity for the first time in spiking Transformer training: it designs gradient-sensitivity-based dynamic sparse masks, enables event-driven spatiotemporal sparse computation during forward propagation, applies sparse gradient updates in backpropagation, and integrates hardware-friendly low-precision quantization. Crucially, the approach preserves full-precision model accuracy while achieving up to 72% reduction in training energy consumption across multiple benchmark tasks. This substantial improvement in energy efficiency provides a scalable and practical solution for large-scale spiking neural network training.

Technology Category

Application Category

📝 Abstract

(1) Pengcheng Laboratory, (2) Southern University of Science and Technology, (3) Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, (4) University of Chinese Academy of Sciences

Problem

Research questions and friction points this paper is trying to address.

Optimizes energy efficiency in Spiking Transformer training

Enhances temporal-spatial architecture for neural networks

Addresses computational challenges in spiking neural models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-spatial optimized architecture

Energy-efficient training method

Spiking Transformer technology

🔎 Similar Papers

Spiking Transformer with Spatial-Temporal Attention