🤖 AI Summary
Edge AI demands low-power, online-learnable models that operate independently of cloud infrastructure; however, conventional deep learning incurs prohibitive computational overhead, while existing brain-inspired neural networks—such as Bayesian Confidence Propagation Neural Networks (BCPNN)—typically rely on GPUs or data-center FPGAs, hindering deployment on resource-constrained embedded edge devices. This work proposes the first embedded FPGA neuromorphic accelerator architecture tailored for the Xilinx Zynq UltraScale+ SoC, enabling on-chip online learning and real-time inference for BCPNN—a first in the field. Leveraging high-level synthesis (HLS), the design supports sparse connectivity, localized learning rules, and dynamic mixed-precision configuration. Evaluated on MNIST, pneumonia, and breast cancer datasets, it achieves up to 17.5× lower latency and 94% energy reduction versus an ARM CPU baseline, with zero accuracy loss. This advances neuromorphic computing from cloud/data-center environments toward practical deployment in power- and resource-limited edge scenarios.
📝 Abstract
Edge AI applications increasingly require models that can learn and adapt on-device with minimal energy budget. Traditional deep learning models, while powerful, are often overparameterized, energy-hungry, and dependent on cloud connectivity. Brain-Like Neural Networks (BLNNs), such as the Bayesian Confidence Propagation Neural Network (BCPNN), propose a neuromorphic alternative by mimicking cortical architecture and biologically-constrained learning. They offer sparse architectures with local learning rules and unsupervised/semi-supervised learning, making them well-suited for low-power edge intelligence. However, existing BCPNN implementations rely on GPUs or datacenter FPGAs, limiting their applicability to embedded systems. This work presents the first embedded FPGA accelerator for BCPNN on a Zynq UltraScale+ SoC using High-Level Synthesis. We implement both online learning and inference-only kernels with support for variable and mixed precision. Evaluated on MNIST, Pneumonia, and Breast Cancer datasets, our accelerator achieves up to 17.5x latency and 94% energy savings over ARM baselines, without sacrificing accuracy. This work enables practical neuromorphic computing on edge devices, bridging the gap between brain-like learning and real-world deployment.