Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

📅 2024-08-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

To address the low energy efficiency and hardware compatibility challenges in deploying Transformer models for TinyML, this paper proposes a heterogeneous acceleration architecture and an end-to-end automated deployment methodology tailored for 8-bit quantized attention inference. The method integrates a RISC-V octa-core cluster with a custom attention accelerator, co-designed under stringent tinyML power constraints. Key contributions include: (1) the first heterogeneous architecture enabling 8-bit quantized Transformer inference within typical tinyML power budgets; and (2) a scalable heterogeneous template implemented in 22 nm FD-SOI, coupled with a compiler-deployment co-optimization flow. Experimental results demonstrate that the system achieves 2960 GOp/J energy efficiency and 154 GOp/s throughput at 0.65 V—setting a new state-of-the-art in energy efficiency for Transformer inference in the TinyML domain.

Technology Category

Application Category

📝 Abstract

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22 nm FD-SOI technology).

Problem

Research questions and friction points this paper is trying to address.

TinyML

Performance Enhancement

Energy Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

RISC-V Processor

Attention Model

TinyML Deployment

🔎 Similar Papers

No similar papers found.