All-Rounder: A Flexible AI Accelerator With Diverse Data Format Support and Morphable Structure for Multi-DNN Processing

πŸ“… 2023-10-25
πŸ›οΈ IEEE Transactions on Very Large Scale Integration (VLSI) Systems
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing AI accelerators (e.g., TPU, NNP-I/T) suffer from limited support for multi-format data and heterogeneous DNN operators, resulting in poor model adaptability and low hardware utilization. To address this, we propose All-Rounderβ€”a reconfigurable AI accelerator supporting full-precision integer and floating-point arithmetic. Its key innovations include: (1) a novel β€œall-in-one” multiply-accumulate (MAC) unit achieving full precision compatibility with 49% area reduction; (2) a dynamically partitionable and fusible MAC array enabling operator-level hardware morphing; and (3) a data-format-aware scheduling framework integrated with hardware-level operator fusion. Evaluated on vision and large language model (LLM) workloads, All-Rounder achieves 1.8Γ— higher energy efficiency and 1.6Γ— lower latency on average compared to state-of-the-art TPUs and NPUs, significantly enhancing flexibility and efficiency for heterogeneous model inference.
πŸ“ Abstract
Recognizing the explosive increase in the use of AI-based applications, several industrial companies developed custom ASICs (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with them. These ASICs perform operations of the inference or training process of AI models which are requested by users. Since the AI models have different data formats and types of operations, the ASICs need to support diverse data formats and various operation shapes. However, the previous ASIC solutions do not or less fulfill these requirements. To overcome these limitations, we first present an area-efficient multiplier, named all-in-one multiplier, that supports multiple bit-widths for both integer and floating point data types. Then, we build a MAC array equipped with these multipliers with multi-format support. In addition, the MAC array can be partitioned into multiple blocks that can be flexibly fused to support various DNN operation types. We evaluate the practical effectiveness of the proposed MAC array by making an accelerator out of it, named All-rounder. According to our evaluation, the proposed all-in-one multiplier occupies 1.49x smaller area compared to the baselines with dedicated multipliers for each data format. Then, we compare the performance and energy efficiency of the proposed All-rounder with three different accelerators showing consistent speedup and higher efficiency across various AI benchmarks from vision to LLM-based language tasks.
Problem

Research questions and friction points this paper is trying to address.

Supports diverse data formats for AI models
Enables flexible processing of various DNN operations
Improves area efficiency and energy efficiency in AI accelerators
Innovation

Methods, ideas, or system contributions that make the work stand out.

All-in-one multiplier supports multiple bit-widths
MAC array with multi-format and flexible partitioning
All-rounder accelerator enhances speed and energy efficiency
πŸ”Ž Similar Papers
No similar papers found.
S
Seock-Hwan Noh
DGIST
S
Seungpyo Lee
DGIST
B
Banseok Shin
DGIST
S
Sehun Park
DGIST
Y
Yongjoo Jang
Korea University, Republic of Korea
Jaeha Kung
Jaeha Kung
Associate Professor, Korea University
Accelerator DesignApproximate ComputingML ArchitectureVLSI