A Compute and Communication Runtime Model for Loihi 2

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

282K/year

🤖 AI Summary

This work addresses the absence of accurate performance models for predicting runtime on contemporary neuromorphic hardware, particularly in capturing communication overhead. We present the first multidimensional Roofline-style maximal affine lower-bound runtime model tailored for the Intel Loihi 2 chip. By integrating microbenchmarks to jointly model compute and on-chip network communication costs, and combining network congestion analysis with analytical derivations, our model reveals area–runtime trade-offs and superlinear scaling behavior of linear layers under varying spatial configurations. The model’s predictions exhibit strong agreement with empirical measurements, achieving Pearson correlation coefficients of at least 0.97. It has already enabled the design of high-performance kernels for matrix–vector multiplication and QUBO solvers on Loihi 2.

Technology Category

Application Category

📝 Abstract

Neuromorphic computers hold the potential to vastly improve the speed and efficiency of a wide range of computational kernels with their asynchronous, compute-memory co-located, spatially distributed, and scalable nature. However, performance models that are simple yet sufficiently expressive to predict runtime on actual neuromorphic hardware are lacking, posing a challenge for researchers and developers who strive to design fast algorithms and kernels. As breaking the memory bandwidth wall of conventional von-Neumann architectures is a primary neuromorphic advantage, modeling communication time is especially important. At the same time, modeling communication time is difficult, as complex congestion patterns arise in a heavily-loaded Network-on-Chip. In this work, we introduce the first max-affine lower-bound runtime model -- a multi-dimensional roofline model -- for Intel's Loihi 2 neuromorphic chip that quantitatively accounts for both compute and communication based on a suite of microbenchmarks. Despite being a lower-bound model, we observe a tight correspondence (Pearson correlation coefficient greater than or equal to 0.97) between our model's estimated runtime and the measured runtime on Loihi 2 for a neural network linear layer, i.e., matrix-vector multiplication, and for an example application, a Quadratic Unconstrained Binary Optimization solver. Furthermore, we derive analytical expressions for communication-bottlenecked runtime to study scalability of the linear layer, revealing an area-runtime tradeoff for different spatial workload configurations with linear to superliner runtime scaling in layer size with a variety of constant factors. Our max-affine runtime model helps empower the design of high-speed algorithms and kernels for Loihi 2.

Problem

Research questions and friction points this paper is trying to address.

neuromorphic computing

runtime modeling

communication bottleneck

Loihi 2

performance prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuromorphic computing

runtime modeling

Loihi 2