STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high latency and computational overhead of Spiking Vision Transformers (SNN-ViTs) arising from multi-timestep inference, this paper proposes the first spatiotemporal co-adaptive computation framework. Unlike conventional Adaptive Computation Time (ACT) methods—which suffer from invalid temporal similarity assumptions and architectural rigidity in SNNs—our approach introduces an Integrated Spiking Block Segmentation (I-SPS) module to enhance temporal stability and a two-dimensional Adaptive Spiking Self-Attention (A-SSA) mechanism enabling joint spatial-temporal token pruning. Evaluated on CIFAR-10, CIFAR-100, and ImageNet, the method reduces energy consumption by 45.9%, 43.8%, and 30.1%, respectively, while surpassing state-of-the-art SNN-ViT models in classification accuracy. This work constitutes the first successful instantiation and empirical validation of the ACT principle in spiking ViT architectures.

Technology Category

Application Category

📝 Abstract
Spiking neural networks (SNNs) offer energy efficiency over artificial neural networks (ANNs) but suffer from high latency and computational overhead due to their multi-timestep operational nature. While various dynamic computation methods have been developed to mitigate this by targeting spatial, temporal, or architecture-specific redundancies, they remain fragmented. While the principles of adaptive computation time (ACT) offer a robust foundation for a unified approach, its application to SNN-based vision Transformers (ViTs) is hindered by two core issues: the violation of its temporal similarity prerequisite and a static architecture fundamentally unsuited for its principles. To address these challenges, we propose STAS (Spatio-Temporal Adaptive computation time for Spiking transformers), a framework that co-designs the static architecture and dynamic computation policy. STAS introduces an integrated spike patch splitting (I-SPS) module to establish temporal stability by creating a unified input representation, thereby solving the architectural problem of temporal dissimilarity. This stability, in turn, allows our adaptive spiking self-attention (A-SSA) module to perform two-dimensional token pruning across both spatial and temporal axes. Implemented on spiking Transformer architectures and validated on CIFAR-10, CIFAR-100, and ImageNet, STAS reduces energy consumption by up to 45.9%, 43.8%, and 30.1%, respectively, while simultaneously improving accuracy over SOTA models.
Problem

Research questions and friction points this paper is trying to address.

Reducing high latency and computational overhead in spiking neural networks
Addressing temporal dissimilarity issues in spiking transformer architectures
Enabling dynamic computation across spatial and temporal dimensions efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatio-temporal adaptive computation for energy efficiency
Integrated spike patch splitting for temporal stability
Adaptive spiking self-attention for 2D token pruning
🔎 Similar Papers
No similar papers found.