NEURAL: An Elastic Neuromorphic Architecture with Hybrid Data-Event Execution and On-the-fly Attention Dataflow

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing SNN hardware suffers from high latency and low energy efficiency due to spike sparsity and multi-timestep execution. This paper proposes NEURAL, a reconfigurable brain-inspired architecture based on a hybrid data- and event-driven execution paradigm: it decouples sparsity-aware processing from neuron computation to enable online execution of spike-based QKFormer; introduces elastic FIFO scheduling and a Window-to-First-Spike-Time (W2TTFS) conversion mechanism to achieve dynamic attention fusion without dedicated hardware; and leverages knowledge distillation to construct a high-accuracy, single-timestep, fully spiking SNN. FPGA evaluation demonstrates that NEURAL reduces resource consumption by 50% and improves energy efficiency by 1.97× over state-of-the-art SNN accelerators, while achieving +3.20% and +5.13% accuracy gains on CIFAR-10 and CIFAR-100, respectively, using VGG-11.

Technology Category

Application Category

📝 Abstract

Spiking neural networks (SNNs) have emerged as a promising alternative to artificial neural networks (ANNs), offering improved energy efficiency by leveraging sparse and event-driven computation. However, existing hardware implementations of SNNs still suffer from the inherent spike sparsity and multi-timestep execution, which significantly increase latency and reduce energy efficiency. This study presents NEURAL, a novel neuromorphic architecture based on a hybrid data-event execution paradigm by decoupling sparsity-aware processing from neuron computation and using elastic first-in-first-out (FIFO). NEURAL supports on-the-fly execution of spiking QKFormer by embedding its operations within the baseline computing flow without requiring dedicated hardware units. It also integrates a novel window-to-time-to-first-spike (W2TTFS) mechanism to replace average pooling and enable full-spike execution. Furthermore, we introduce a knowledge distillation (KD)-based training framework to construct single-timestep SNN models with competitive accuracy. NEURAL is implemented on a Xilinx Virtex-7 FPGA and evaluated using ResNet-11, QKFResNet-11, and VGG-11. Experimental results demonstrate that, at the algorithm level, the VGG-11 model trained with KD improves accuracy by 3.20% on CIFAR-10 and 5.13% on CIFAR-100. At the architecture level, compared to existing SNN accelerators, NEURAL achieves a 50% reduction in resource utilization and a 1.97x improvement in energy efficiency.

Problem

Research questions and friction points this paper is trying to address.

Addressing spike sparsity and multi-timestep execution inefficiencies in SNN hardware

Implementing hybrid data-event execution paradigm for neuromorphic computing

Enabling on-the-fly attention dataflow without dedicated hardware units

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid data-event execution paradigm

On-the-fly attention dataflow execution

Knowledge distillation training framework

🔎 Similar Papers

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow