🤖 AI Summary
To address low hardware utilization and poor energy efficiency in spiking neural network (SNN) inference on general-purpose RISC-V clusters—caused by event sparsity—this paper proposes a lightweight RISC-V sparse-computing ISA extension and an accompanying streaming acceleration methodology. Our approach introduces three key innovations: (1) the first RISC-V ISA extension explicitly designed for SNN event sparsity; (2) SpikeStream, a streaming mapping mechanism that optimizes weight memory access via affine and indirect register-based memory streaming; and (3) support for multi-core parallel scheduling and event-driven execution. Evaluated on Spiking-VGG11, our method achieves a 4.39× speedup, increases core utilization from 9.28% to 52.3%, improves energy efficiency by 3.46× over LSMCore, and outperforms Loihi by 2.38× in throughput. The design balances flexibility, energy efficiency, and hardware scalability without requiring specialized accelerators.
📝 Abstract
Spiking Neural Network (SNN) inference has a clear potential for high energy efficiency as computation is triggered by events. However, the inherent sparsity of events poses challenges for conventional computing systems, driving the development of specialized neuromorphic processors, which come with high silicon area costs and lack the flexibility needed for running other computational kernels, limiting widespread adoption. In this paper, we explore the low-level software design, parallelization, and acceleration of SNNs on general-purpose multicore clusters with a low-overhead RISC-V ISA extension for streaming sparse computations. We propose SpikeStream, an optimization technique that maps weights accesses to affine and indirect register-mapped memory streams to enhance performance, utilization, and efficiency. Our results on the end-to-end Spiking-VGG11 model demonstrate a significant 4.39x speedup and an increase in utilization from 9.28% to 52.3% compared to a non-streaming parallel baseline. Additionally, we achieve an energy efficiency gain of 3.46x over LSMCore and a performance gain of 2.38x over Loihi.