STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current spiking Transformers suffer from fragmented implementations and inconsistent evaluation protocols, hindering fair benchmarking and mechanistic analysis. To address this, we propose STEP—the first unified evaluation platform for spiking Transformers—supporting diverse tasks (classification, segmentation, detection) and multimodal data (static, event-based, sequential). STEP integrates major backends (e.g., SpikingJelly, BrainCog), offers modular components, and introduces a joint quantization-sparse energy-efficiency modeling framework. We establish a standardized evaluation protocol and a unified energy consumption model, systematically reproducing state-of-the-art models and conducting ablation studies on attention mechanisms, neuron types, and architectural choices. Our analysis reveals that prevailing approaches heavily rely on CNN-based front-ends and lack native temporal modeling capabilities. Crucially, quantized artificial neural networks (ANNs) achieve energy efficiency comparable to—or even exceeding—that of current spiking Transformers, providing empirical grounding for future native spiking architecture design.

Technology Category

Application Category

📝 Abstract
Spiking Transformers have recently emerged as promising architectures for combining the efficiency of spiking neural networks with the representational power of self-attention. However, the lack of standardized implementations, evaluation pipelines, and consistent design choices has hindered fair comparison and principled analysis. In this paper, we introduce extbf{STEP}, a unified benchmark framework for Spiking Transformers that supports a wide range of tasks, including classification, segmentation, and detection across static, event-based, and sequential datasets. STEP provides modular support for diverse components such as spiking neurons, input encodings, surrogate gradients, and multiple backends (e.g., SpikingJelly, BrainCog). Using STEP, we reproduce and evaluate several representative models, and conduct systematic ablation studies on attention design, neuron types, encoding schemes, and temporal modeling capabilities. We also propose a unified analytical model for energy estimation, accounting for spike sparsity, bitwidth, and memory access, and show that quantized ANNs may offer comparable or better energy efficiency. Our results suggest that current Spiking Transformers rely heavily on convolutional frontends and lack strong temporal modeling, underscoring the need for spike-native architectural innovations. The full code is available at: https://github.com/Fancyssc/STEP
Problem

Research questions and friction points this paper is trying to address.

Standardized evaluation of Spiking Transformers is lacking
Unified benchmark for diverse tasks and datasets needed
Energy efficiency comparison between Spiking Transformers and ANNs unclear
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark framework for Spiking Transformers
Modular support for diverse neural components
Unified energy estimation model with spike sparsity
🔎 Similar Papers
No similar papers found.
S
Sicheng Shen
BrainCog Lab, CASIA; School of Future Tech., UCAS; Long-term AI; Zhongguancun Academy
Dongcheng Zhao
Dongcheng Zhao
Beijing Institute of AI Safety and Governance
Spiking Neural NetworksEvent Based VisionBrain-inspired AILLM Safety
L
Linghao Feng
BrainCog Lab, CASIA; Long-term AI
Z
Zeyang Yue
Beihang University
J
Jindong Li
BrainCog Lab, CASIA; Long-term AI
Tenglong Li
Tenglong Li
Institute of Automation, Chinese Academy of Sciences
Hardware Architecture
G
Guobin Shen
BrainCog Lab, CASIA; Long-term AI
Y
Yi Zeng
BrainCog Lab, CASIA; Long-term AI