🤖 AI Summary
To address the resource constraints of edge devices, this paper proposes BESTformer—the first binary event-driven spiking Transformer—designed to achieve high accuracy with ultra-low computational and memory overhead. Methodologically, it employs 1-bit weight quantization and binary attention maps, and introduces a novel Coupled Information Enhancement (CIE) distillation framework: leveraging reversible network architecture and mutual information maximization to bridge the information gap between the binary student and full-precision teacher models. The design tightly integrates spiking neural dynamics, event-driven attention, and binary optimization. Experiments demonstrate that BESTformer significantly outperforms existing binary spiking neural networks (SNNs) on both static and neuromorphic datasets, maintaining competitive accuracy while drastically reducing parameter count and FLOPs. This work establishes a compact, efficient, and energy-efficient paradigm for edge AI.
📝 Abstract
Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm that combines the high performance of Transformers with the energy efficiency of SNNs. However, the larger model size and increased computational demands of the Transformer structure limit their practicality in resource-constrained scenarios. In this paper, we integrate binarization techniques into Transformer-based SNNs and propose the Binary Event-Driven Spiking Transformer, i.e. BESTformer. The proposed BESTformer can significantly reduce storage and computational demands by representing weights and attention maps with a mere 1-bit. However, BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization. To address this issue, we propose a Coupled Information Enhancement (CIE) method, which consists of a reversible framework and information enhancement distillation. By maximizing the mutual information between the binary model and its full-precision counterpart, the CIE method effectively mitigates the performance degradation of the BESTformer. Extensive experiments on static and neuromorphic datasets demonstrate that our method achieves superior performance to other binary SNNs, showcasing its potential as a compact yet high-performance model for resource-limited edge devices.