🤖 AI Summary
This work addresses the challenge of efficiently mapping neural network computation graphs onto spatial accelerators by proposing the first hardware-in-the-loop automated mapping framework based on evolutionary algorithms. The approach formulates mapping as a black-box optimization problem, eliminating the need for expert heuristics or in-depth hardware knowledge. Candidate mappings are evaluated in real time on the Intel Loihi 2 neuromorphic chip—a 152-core 2D mesh architecture—enabling end-to-end automated deployment. Experimental results demonstrate that, compared to default heuristic strategies, the proposed method reduces latency by up to 35% and improves energy efficiency by 40% on a single chip. Furthermore, the framework seamlessly scales to multi-chip systems, significantly enhancing both deployment efficiency and overall performance.
📝 Abstract
Spatial accelerators, composed of arrays of compute-memory integrated units, offer an attractive platform for deploying inference workloads with low latency and low energy consumption. However, fully exploiting their architectural advantages typically requires careful, expert-driven mapping of computational graphs to distributed processing elements. In this work, we automate this process by framing the mapping challenge as a black-box optimization problem. We introduce the first evolutionary, hardware-in-the-loop mapping framework for neuromorphic accelerators, enabling users without deep hardware knowledge to deploy workloads more efficiently. We evaluate our approach on Intel Loihi 2, a representative spatial accelerator featuring 152 cores per chip in a 2D mesh. Our method achieves up to 35% reduction in total latency compared to default heuristics on two sparse multi-layer perceptron networks. Furthermore, we demonstrate the scalability of our approach to multi-chip systems and observe an up to 40% improvement in energy efficiency, without explicitly optimizing for it.