Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing ANN-to-SNN conversion methods struggle to efficiently implement nonlinear operators in Transformers—such as Softmax, SiLU, and normalization layers—due to their reliance on division, exponentiation, and ℓ²-norm operations, which are incompatible with leaky integrate-and-fire (LIF) neuron dynamics. This work proposes a plug-and-play spiking operator framework that, for the first time, systematically decomposes these nonlinearities into three fundamental primitives. By leveraging population coding with LIF neurons and lightweight shift-scale operations, the framework achieves floating-point-free, training-free, and spike-friendly approximations. Its modular design seamlessly integrates into existing ANN-to-SNN pipelines without requiring fine-tuning, supporting mainstream nonlinear functions while preserving model performance across various Transformer architectures with less than 1% accuracy degradation upon operator replacement.

📝 Abstract

ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transformer linear-algebra operations, while providing limited support for key nonlinear operators. This gap limits compatibility with neuromorphic-style execution constraints, where such nonlinearities typically require division, exponentiation, or norm computations that are not naturally supported by standard leaky integrate-and-fire dynamics. To solve this problem, we propose a plug-and-play framework that implements spike-friendly approximations for Transformer nonlinearities and integrates into existing ANN-to-SNN pipelines. Our method decomposes these nonlinear computations into three recurring primitives -- division, exponentiation, and $\ell_2$ norms -- and realizes them via population computation using LIF neuron groups, combined with lightweight bit-shift scaling to avoid floating-point arithmetic. By composing these primitives as modular operator blocks, our framework supports common Transformer nonlinearities (e.g., Softmax, SiLU, and normalization) without any fine-tuning. Experiments on a range of LLMs Transformers show that selectively replacing the targeted nonlinear operators incurs less than a $1\%$ accuracy drop across all evaluated tasks.

Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Networks

Nonlinear Operators

ANN-to-SNN Conversion

Transformers

Neuromorphic Computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking Neural Networks

ANN-to-SNN conversion

Nonlinear Operators