Spiking Transformer with Spatial-Temporal Attention

📅 2024-09-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing spiking Transformers model only spatial attention, neglecting the intrinsic temporal dynamics of spike sequences, thereby limiting representational capacity. This paper proposes ST-Transformer, the first spiking Transformer that jointly models spatial and temporal self-attention within the spike domain. We introduce a block-level spatio-temporal collaborative computation strategy that enables efficient fusion of spatio-temporal features without increasing model parameters or computational overhead. The architecture incorporates leaky integrate-and-fire (LIF) neurons, binary sparse computation, and a plug-and-play design, ensuring compatibility with mainstream spiking Transformers. Extensive experiments demonstrate significant accuracy improvements over state-of-the-art methods on both static (CIFAR-10/100, ImageNet) and neuromorphic datasets (CIFAR10-DVS, N-Caltech101). The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Spike-based Transformer presents a compelling and energy-efficient alternative to traditional Artificial Neural Network (ANN)-based Transformers, achieving impressive results through sparse binary computations. However, existing spike-based transformers predominantly focus on spatial attention while neglecting crucial temporal dependencies inherent in spike-based processing, leading to suboptimal feature representation and limited performance. To address this limitation, we propose Spiking Transformer with Spatial-Temporal Attention (STAtten), a simple and straightforward architecture that efficiently integrates both spatial and temporal information in the self-attention mechanism. STAtten introduces a block-wise computation strategy that processes information in spatial-temporal chunks, enabling comprehensive feature capture while maintaining the same computational complexity as previous spatial-only approaches. Our method can be seamlessly integrated into existing spike-based transformers without architectural overhaul. Extensive experiments demonstrate that STAtten significantly improves the performance of existing spike-based transformers across both static and neuromorphic datasets, including CIFAR10/100, ImageNet, CIFAR10-DVS, and N-Caltech101. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/STAtten
Problem

Research questions and friction points this paper is trying to address.

Enhances spike-based transformers by integrating spatial-temporal attention.
Addresses suboptimal feature representation in existing spike-based transformers.
Improves performance on static and neuromorphic datasets without increasing complexity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates spatial-temporal attention in spike-based transformers
Uses block-wise computation for efficient feature capture
Maintains computational complexity of spatial-only approaches
🔎 Similar Papers
No similar papers found.
D
Donghyun Lee
Department of Electrical Engineering, Yale University
Yuhang Li
Yuhang Li
Yale University
Machine Learning
Youngeun Kim
Youngeun Kim
Applied Scientist, Amazon AWS AI Labs
Machine LearningEfficient AINeuromorphic Computing
S
Shiting Xiao
Department of Electrical Engineering, Yale University
P
P. Panda
Department of Electrical Engineering, Yale University