🤖 AI Summary
This work addresses the computational and latency bottlenecks of deploying one-dimensional convolutional neural networks (1D-CNNs) for atrial fibrillation detection on resource-constrained micro-scale smart sensor systems. To overcome these challenges, the authors propose an FPGA-efficient implementation based on lookup table (LUT) precomputation. By generalizing depthwise separable convolutions into a unified grouped convolution framework, they design a novel convolutional block architecture accompanied by an automated hyperparameter selection algorithm, substantially enhancing hardware scalability. Implemented on an AMD Spartan-7 S15 FPGA using only 2,844 LUTs—without requiring DSP slices or block RAM—the system achieves a 95% F1 score, demonstrating successful ultra-low-latency, highly resource-efficient edge deployment.
📝 Abstract
1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization technique to implement such neural networks on FPGAs. The core idea is to precompute all possible outputs of a neural network layer and store them directly in the lookup tables of the FPGAs. This enables highly resource-efficient networks with ultra-low latency but suffers from poor scalability. Previous work has explored using depthwise-separable convolutions to improve scalability. In this paper, we generalize this approach to consider additional forms of grouped convolutions. Based on this, we propose a novel type of convolutional block and an algorithm to guide the choice of hyper parameters for this block. We evaluate our approach on a medical time-series dataset for predicting atrial fibrillation using the MIT-BIH database (ECG recordings). The resulting hardware accelerators are small enough to be deployed on an AMD Spartan 7 S15. They achieve a F1-Score of up to 95% while only requiring 2,844 LUTs and no DSPs or BRAM.