Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding

πŸ“… 2026-01-30
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the gap between theoretical energy efficiency and practical deployment constraints in spiking neural networks (SNNs), which often overlook real hardware costs such as data movement. To bridge this gap, the authors propose the Matterhorn architecture, integrating Masked Time-to-First-Spike (M-TTFS) encoding with a β€œdead zone” sparsification strategy to align spiking activity with input statistics and maximize sparsity. Furthermore, Matterhorn employs memristive synaptic units (MSUs) to enable analog compute-in-memory (CIM), thereby eliminating weight-access energy overhead. Evaluated on the GLUE benchmark, the proposed approach achieves an average accuracy improvement of 1.42% over existing SNNs and delivers a 2.31Γ— higher energy efficiency, establishing a new state-of-the-art for SNN performance.

Technology Category

Application Category

πŸ“ Abstract
Spiking neural networks (SNNs) have emerged as a promising candidate for energy-efficient LLM inference. However, current energy evaluations for SNNs primarily focus on counting accumulate operations, and fail to account for real-world hardware costs such as data movement, which can consume nearly 80% of the total energy. In this paper, we propose Matterhorn, a spiking transformer that integrates a novel masked time-to-first-spike (M-TTFS) encoding method to reduce spike movement and a memristive synapse unit (MSU) to eliminate weight access overhead. M-TTFS employs a masking strategy that reassigns the zero-energy silent state (a spike train of all 0s) to the most frequent membrane potential rather than the lowest. This aligns the coding scheme with the data distribution, minimizing spike movement energy without information loss. We further propose a `dead zone'strategy that maximizes sparsity by mapping all values within a given range to the silent state. At the hardware level, the MSU utilizes compute-in-memory (CIM) technology to perform analog integration directly within memory, effectively removing weight access costs. On the GLUE benchmark, Matterhorn establishes a new state-of-the-art, surpassing existing SNNs by 1.42% in average accuracy while delivering a 2.31 times improvement in energy efficiency.
Problem

Research questions and friction points this paper is trying to address.

spiking neural networks
energy efficiency
data movement
hardware cost
LLM inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

masked time-to-first-spike encoding
memristive synapse unit
compute-in-memory
spiking transformer
energy-efficient SNN
πŸ”Ž Similar Papers
No similar papers found.
Zhanglu Yan
Zhanglu Yan
National University of Singapore
Artificial Intelligence
Kaiwen Tang
Kaiwen Tang
National University of Singapore
Z
Zixuan Zhu
Shanghai Advanced Research Institute, Chinese Academy of Science; University of Chinese Academy of Sciences, Beijing
Z
Zhenyu Bai
Department of computer science, National University of Singapore
Q
Qianhui Liu
School of Artificial Intelligence, Shandong University
Weng-Fai Wong
Weng-Fai Wong
Associate Professor of Computer Science, National University of Singapore
Computer architecturecompilershigh performance computingembedded systemsparallel processing