🤖 AI Summary
To address the low throughput and high power consumption of Bayesian Confidence Propagation Neural Network (BCPNN) implementations on CPU/GPU-based edge devices, this paper proposes a streaming, reconfigurable FPGA hardware accelerator tailored for BCPNN. We introduce, for the first time, a first-principles–driven streaming architecture and a reconfigurable acceleration paradigm, integrating Vitis HLS-based high-level synthesis, custom BCPNN computational units, and joint power–latency modeling to co-optimize energy efficiency and performance. Experimental evaluation demonstrates that, compared to an NVIDIA A100 GPU, our accelerator achieves 1.3–5.3× speedup, reduces power consumption by 2.62–3.19×, and lowers energy consumption by 5.8–16.5×—all without any accuracy loss. This work establishes a novel deployment paradigm for brain-inspired neural networks in resource-constrained edge computing scenarios.
📝 Abstract
Brain-inspired algorithms are attractive and emerging alternatives to classical deep learning methods for use in various machine learning applications. Brain-inspired systems can feature local learning rules, both unsupervised/semi-supervised learning and different types of plasticity (structural/synaptic), allowing them to potentially be faster and more energy-efficient than traditional machine learning alternatives. Among the more salient brain-inspired algorithms are Bayesian Confidence Propagation Neural Networks (BCPNNs). BCPNN is an important tool for both machine learning and computational neuroscience research, and recent work shows that BCPNN can reach state-of-the-art performance in tasks such as learning and memory recall compared to other models. Unfortunately, BCPNN is primarily executed on slow general-purpose processors (CPUs) or power-hungry graphics processing units (GPUs), reducing the applicability of using BCPNN in (among others) Edge systems. In this work, we design a custom stream-based accelerator for BCPNN using Field-Programmable Gate Arrays (FPGA) using Xilinx Vitis High-Level Synthesis (HLS) flow. Furthermore, we model our accelerator's performance using first principles, and we empirically show that our proposed accelerator is between 1.3x - 5.3x faster than an Nvidia A100 GPU while at the same time consuming between 2.62x - 3.19x less power and 5.8x - 16.5x less energy without any degradation in performance.