🤖 AI Summary
To address the absence of relative position modeling in spiking Transformers—hindering effective capture of spatial relationships among sequences and image patches—this paper introduces Gray coding to spiking neural networks (SNNs) for the first time. We propose a hardware-friendly, low-precision approximate relative position encoding (RPE) method that uniformly supports both 1D sequential and 2D patch-based inputs. By leveraging the robustness of Gray code encoding and integrating it with spiking self-attention, our approach preserves the sparse computation advantage of SNNs while significantly enhancing positional awareness. Extensive experiments on time-series forecasting, text classification, and image-patch classification demonstrate consistent and substantial performance gains. These results validate the effectiveness and generalizability of the proposed RPE in enabling efficient relative position modeling within spiking Transformers.
📝 Abstract
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes, which have great potential in various tasks due to their energy efficiency and temporal processing capabilities. SNNs with self-attention mechanisms (Spiking Transformers) have recently shown great advancements in various tasks such as sequential modeling and image classifications. However, integrating positional information, which is essential for capturing sequential relationships in data, remains a challenge in Spiking Transformers. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers, leveraging Gray Code as the foundation for our approach. We provide comprehensive proof of the method's effectiveness in partially capturing relative positional information for sequential tasks. Additionally, we extend our RPE approach by adapting it to a two-dimensional form suitable for image patch processing. We evaluate the proposed RPE methods on several tasks, including time series forecasting, text classification, and patch-based image classification. Our experimental results demonstrate that the incorporation of RPE significantly enhances performance by effectively capturing relative positional information.