🤖 AI Summary
This work addresses the challenge of deploying large, complex deep neural networks (DNNs)—notably Transformers—on the SpiNNaker2 neuromorphic MPSoC for efficient inference. We present the first end-to-end PyTorch-to-SpiNNaker2 inference framework, built upon an extended OctopusScheduler that integrates multi-layer DNN scheduling, fine-grained model quantization, operator downgrading, and neuromorphic hardware mapping. Our framework enables full compilation and execution of PyTorch models directly onto a single SpiNNaker2 chip. To our knowledge, this is the first demonstration of end-to-end Transformer-scale DNN inference on SpiNNaker2, achieving high accuracy while significantly improving energy efficiency and throughput. The work overcomes a key bottleneck in neuromorphic hardware support for large-scale DNN inference and provides a scalable, system-level solution for deploying complex AI models at the edge.
📝 Abstract
This work presents a multi-layer DNN scheduling framework as an extension of OctopuScheduler, providing an end-to-end flow from PyTorch models to inference on a single SpiNNaker2 chip. Together with a front-end comprised of quantization and lowering steps, the proposed framework enables the edge-based execution of large and complex DNNs up to transformer scale using the neuromorphic platform SpiNNaker2.