Quantized Spike-driven Transformer

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Low-bit quantization of spiking neural networks (SNNs), particularly spiking Transformers, on resource-constrained devices suffers from severe performance degradation due to information distortion in quantized self-attention mechanisms. To address this, we propose a two-tier co-optimization framework: (1) at the neuron level, an information-enhanced leaky integrate-and-fire (LIF) neuron dynamically rectifies quantized spike-based attention distributions; (2) at the architectural level, fine-grained knowledge distillation—jointly optimized with mutual information entropy—aligns attention response distributions between the teacher artificial neural network (ANN) and the student SNN. Our approach synergistically integrates quantized neural network design, spiking neural dynamics modeling, and structured knowledge transfer. Evaluated on ImageNet, our method achieves 80.3% top-1 accuracy, reducing energy consumption by 6.0× and parameter storage by 8.1× over baseline models. To our knowledge, this is the first work to achieve both high accuracy and high energy efficiency in sub-4-bit spiking Transformers.

Technology Category

Application Category

📝 Abstract

Spiking neural networks are emerging as a promising energy-efficient alternative to traditional artificial neural networks due to their spike-driven paradigm. However, recent research in the SNN domain has mainly focused on enhancing accuracy by designing large-scale Transformer structures, which typically rely on substantial computational resources, limiting their deployment on resource-constrained devices. To overcome this challenge, we propose a quantized spike-driven Transformer baseline (QSD-Transformer), which achieves reduced resource demands by utilizing a low bit-width parameter. Regrettably, the QSD-Transformer often suffers from severe performance degradation. In this paper, we first conduct empirical analysis and find that the bimodal distribution of quantized spike-driven self-attention (Q-SDSA) leads to spike information distortion (SID) during quantization, causing significant performance degradation. To mitigate this issue, we take inspiration from mutual information entropy and propose a bi-level optimization strategy to rectify the information distribution in Q-SDSA. Specifically, at the lower level, we introduce an information-enhanced LIF to rectify the information distribution in Q-SDSA. At the upper level, we propose a fine-grained distillation scheme for the QSD-Transformer to align the distribution in Q-SDSA with that in the counterpart ANN. By integrating the bi-level optimization strategy, the QSD-Transformer can attain enhanced energy efficiency without sacrificing its high-performance advantage.For instance, when compared to the prior SNN benchmark on ImageNet, the QSD-Transformer achieves 80.3% top-1 accuracy, accompanied by significant reductions of 6.0$ imes$ and 8.1$ imes$ in power consumption and model size, respectively. Code is available at https://github.com/bollossom/QSD-Transformer.

Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Networks

Attention Mechanism

Quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

QSD-Transformer

Resource-Efficient SNN

Bit-Reduction Optimization

🔎 Similar Papers

No similar papers found.