Training-Free ANN-to-SNN Conversion for High-Performance Spiking Transformer

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing ANN-to-SNN conversion methods struggle to handle the complex nonlinear operations in Transformers and rely heavily on post-training fine-tuning, hindering efficient deployment of spiking Transformers. This paper proposes a training-free ANN-to-SNN conversion framework. We introduce Multi-Basis Exponential (MBE) spiking neurons that accurately approximate key nonlinearities—including Softmax, LayerNorm, and GeLU—without weight updates or architectural modifications, enabling plug-and-play integration. Coupled with an event-driven execution paradigm and a Multi-basis encoding strategy, our framework supports mainstream architectures such as ViT, RoBERTa, and GPT-2. Evaluated across computer vision (CV), natural language understanding (NLU), and natural language generation (NLG) tasks, it achieves near-lossless accuracy (average degradation <0.5%), reduces inference latency by 3.2–5.8×, and significantly improves energy efficiency. Our approach establishes a novel paradigm for scalable, high-performance spiking Transformer deployment.

Technology Category

Application Category

📝 Abstract

Leveraging the event-driven paradigm, Spiking Neural Networks (SNNs) offer a promising approach for constructing energy-efficient Transformer architectures. Compared to directly trained Spiking Transformers, ANN-to-SNN conversion methods bypass the high training costs. However, existing methods still suffer from notable limitations, failing to effectively handle nonlinear operations in Transformer architectures and requiring additional fine-tuning processes for pre-trained ANNs. To address these issues, we propose a high-performance and training-free ANN-to-SNN conversion framework tailored for Transformer architectures. Specifically, we introduce a Multi-basis Exponential Decay (MBE) neuron, which employs an exponential decay strategy and multi-basis encoding method to efficiently approximate various nonlinear operations. It removes the requirement for weight modifications in pre-trained ANNs. Extensive experiments across diverse tasks (CV, NLU, NLG) and mainstream Transformer architectures (ViT, RoBERTa, GPT-2) demonstrate that our method achieves near-lossless conversion accuracy with significantly lower latency. This provides a promising pathway for the efficient and scalable deployment of Spiking Transformers in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Handling nonlinear operations in Spiking Transformers

Eliminating fine-tuning for pre-trained ANNs

Achieving near-lossless conversion with low latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free ANN-to-SNN conversion framework

Multi-basis Exponential Decay neuron

Near-lossless conversion with low latency

🔎 Similar Papers

Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications