STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Current spiking Transformers suffer from fragmented implementations and inconsistent evaluation protocols, hindering fair benchmarking and mechanistic analysis. To address this, we propose STEP—the first unified evaluation platform for spiking Transformers—supporting diverse tasks (classification, segmentation, detection) and multimodal data (static, event-based, sequential). STEP integrates major backends (e.g., SpikingJelly, BrainCog), offers modular components, and introduces a joint quantization-sparse energy-efficiency modeling framework. We establish a standardized evaluation protocol and a unified energy consumption model, systematically reproducing state-of-the-art models and conducting ablation studies on attention mechanisms, neuron types, and architectural choices. Our analysis reveals that prevailing approaches heavily rely on CNN-based front-ends and lack native temporal modeling capabilities. Crucially, quantized artificial neural networks (ANNs) achieve energy efficiency comparable to—or even exceeding—that of current spiking Transformers, providing empirical grounding for future native spiking architecture design.

Technology Category

Application Category

📝 Abstract

Spiking Transformers have recently emerged as promising architectures for combining the efficiency of spiking neural networks with the representational power of self-attention. However, the lack of standardized implementations, evaluation pipelines, and consistent design choices has hindered fair comparison and principled analysis. In this paper, we introduce extbf{STEP}, a unified benchmark framework for Spiking Transformers that supports a wide range of tasks, including classification, segmentation, and detection across static, event-based, and sequential datasets. STEP provides modular support for diverse components such as spiking neurons, input encodings, surrogate gradients, and multiple backends (e.g., SpikingJelly, BrainCog). Using STEP, we reproduce and evaluate several representative models, and conduct systematic ablation studies on attention design, neuron types, encoding schemes, and temporal modeling capabilities. We also propose a unified analytical model for energy estimation, accounting for spike sparsity, bitwidth, and memory access, and show that quantized ANNs may offer comparable or better energy efficiency. Our results suggest that current Spiking Transformers rely heavily on convolutional frontends and lack strong temporal modeling, underscoring the need for spike-native architectural innovations. The full code is available at: https://github.com/Fancyssc/STEP

Problem

Research questions and friction points this paper is trying to address.

Standardized evaluation of Spiking Transformers is lacking

Unified benchmark for diverse tasks and datasets needed

Energy efficiency comparison between Spiking Transformers and ANNs unclear

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark framework for Spiking Transformers

Modular support for diverse neural components

Unified energy estimation model with spike sparsity

🔎 Similar Papers

No similar papers found.