Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers

📅 2024-08-16
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Spike-based Transformers suffer from low energy efficiency on GPUs due to architectural mismatch between spiking computation and conventional digital hardware. Method: This work proposes the first heterogeneous hardware architecture integrating analog in-memory computing (AIMC) with a stochastic spike attention (SSA) engine. Feed-forward and fully connected layers are mapped onto energy-efficient AIMC arrays, while attention mechanisms are processed in parallel by a custom SSA engine—enabling end-to-end hardware acceleration of SNN-based Transformers. Results: Experiments show that our design reduces energy consumption by 13× over state-of-the-art digital Transformer accelerators while maintaining comparable throughput; it achieves up to 1.9× energy savings versus state-of-the-art digital ASICs for spiking Transformers, with accuracy matching that of ANN-Transformers on GPUs. This is the first demonstration validating the effectiveness and superiority of analog–digital co-design for hardware-accelerated spatiotemporal sequence modeling.

Technology Category

Application Category

📝 Abstract
The integration of neuromorphic computing and transformers through spiking neural networks (SNNs) offers a promising path to energy-efficient sequence modeling, with the potential to overcome the energy-intensive nature of the artificial neural network (ANN)-based transformers. However, the algorithmic efficiency of SNN-based transformers cannot be fully exploited on GPUs due to architectural incompatibility. This paper introduces Xpikeformer, a hybrid analog-digital hardware architecture designed to accelerate SNN-based transformer models. The architecture integrates analog in-memory computing (AIMC) for feedforward and fully connected layers, and a stochastic spiking attention (SSA) engine for efficient attention mechanisms. We detail the design, implementation, and evaluation of Xpikeformer, demonstrating significant improvements in energy consumption and computational efficiency. Through image classification tasks and wireless communication symbol detection tasks, we show that Xpikeformer can achieve inference accuracy comparable to the GPU implementation of ANN-based transformers. Evaluations reveal that Xpikeformer achieves $13 imes$ reduction in energy consumption at approximately the same throughput as the state-of-the-art (SOTA) digital accelerator for ANN-based transformers. Additionally, Xpikeformer achieves up to $1.9 imes$ energy reduction compared to the optimal digital ASIC projection of SOTA SNN-based transformers.
Problem

Research questions and friction points this paper is trying to address.

Hybrid hardware accelerates spiking transformers efficiently
Overcomes GPU incompatibility with SNN-based transformer models
Reduces energy consumption significantly compared to SOTA accelerators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid analog-digital hardware acceleration
Analog in-memory computing for feedforward layers
Stochastic spiking attention engine for efficiency
🔎 Similar Papers
No similar papers found.
Z
Zihang Song
Department of Engineering, King’s College London, London WC2R 2LS, U.K.
P
Prabodh Katti
Department of Engineering, King’s College London, London WC2R 2LS, U.K.
Osvaldo Simeone
Osvaldo Simeone
King's College London
Information theorymachine learningquantum information processingwireless systems
Bipin Rajendran
Bipin Rajendran
Professor of Intelligent Computing Systems at King's College London
Nanoscale logic and memory devicesneuromorphic computation