Automating Versatile Time-Series Analysis with Tiny Transformers on Embedded FPGAs

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address resource constraints, low automation, and poor task generalizability in deploying Transformers on embedded FPGAs, this paper introduces the first fully automated Tiny Transformer deployment framework tailored for time-series analysis—including forecasting, classification, and anomaly detection. Methodologically, it integrates 4-bit quantization-aware training, Optuna-driven hardware-aware hyperparameter search, and automatic VHDL code generation. This enables, for the first time, integer-only, task-specialized encoder-only accelerators on lightweight FPGAs such as Xilinx Spartan-7 and Lattice iCE40. Experimental results demonstrate 0.033 mJ/inference energy efficiency and millisecond-scale latency on AMD Spartan-7. The framework is validated across six public time-series benchmarks and two embedded FPGA platforms. All source code is publicly released.

Technology Category

Application Category

📝 Abstract
Transformer-based models have shown strong performance across diverse time-series tasks, but their deployment on resource-constrained devices remains challenging due to high memory and computational demand. While prior work targeting Microcontroller Units (MCUs) has explored hardware-specific optimizations, such approaches are often task-specific and limited to 8-bit fixed-point precision. Field-Programmable Gate Arrays (FPGAs) offer greater flexibility, enabling fine-grained control over data precision and architecture. However, existing FPGA-based deployments of Transformers for time-series analysis typically focus on high-density platforms with manual configuration. This paper presents a unified and fully automated deployment framework for Tiny Transformers on embedded FPGAs. Our framework supports a compact encoder-only Transformer architecture across three representative time-series tasks (forecasting, classification, and anomaly detection). It combines quantization-aware training (down to 4 bits), hardware-aware hyperparameter search using Optuna, and automatic VHDL generation for seamless deployment. We evaluate our framework on six public datasets across two embedded FPGA platforms. Results show that our framework produces integer-only, task-specific Transformer accelerators achieving as low as 0.033 mJ per inference with millisecond latency on AMD Spartan-7, while also providing insights into deployment feasibility on Lattice iCE40. All source code will be released in the GitHub repository (https://github.com/Edwina1030/TinyTransformer4TS).
Problem

Research questions and friction points this paper is trying to address.

Deploying Transformer models on resource-constrained embedded FPGAs
Automating time-series analysis with Tiny Transformers efficiently
Supporting multiple tasks with low-precision, integer-only accelerators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated deployment framework for Tiny Transformers
4-bit quantization-aware training for efficiency
Automatic VHDL generation for seamless deployment
🔎 Similar Papers
No similar papers found.
T
Tianheng Ling
Department of Intelligent Embedded Systems, University of Duisburg-Essen
Chao Qian
Chao Qian
Nanjing University
Artificial intelligenceevolutionary algorithmsmachine learning
L
Lukas Johannes Hassler
Department of Intelligent Embedded Systems, University of Duisburg-Essen
Gregor Schiele
Gregor Schiele
Professor of Computer Science (Embedded Systems), University Duisburg-Essen, Germany
embedded AIIoTembedded softwareadaptive SW and reconfigurable HW