Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
Deploying LSTM networks on resource-constrained embedded FPGAs faces significant challenges in energy efficiency and inference latency. This work proposes a parameterized LSTM hardware accelerator architecture that achieves substantial improvements in both throughput and energy efficiency while preserving model adaptability through multidimensional configurability—encompassing optimized DSP resource scheduling, customized activation function implementation, and flexible datapath configuration. Experimental results on a real embedded platform demonstrate an energy efficiency of 11.89 GOP/s/W and a real-time inference throughput of 32,873 samples per second, offering an efficient and practical hardware solution for edge-based time-series data analysis.

Technology Category

Application Category

📝 Abstract
Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.
Problem

Research questions and friction points this paper is trying to address.

LSTM
Energy Efficiency
Embedded FPGA
Hardware Accelerator
On-device Inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

LSTM accelerator
embedded FPGA
energy efficiency
parameterised architecture
on-device inference