Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address area and energy-efficiency bottlenecks in multi-format (INT/FP/MX) multiply-accumulate (MAC) operations for AI accelerators, this paper proposes Jack Unit—a highly integrated, precision-configurable MAC architecture. Methodologically, it unifies two key innovations: (1) the first hardware integration of a precision-scalable carry-save multiplier (CSM) with an exponent-difference-driven dynamic mantissa alignment mechanism; and (2) a two-dimensional subword parallelism scheme enabling cross-format computational resource reuse. Custom layout optimization yields, at 40 nm, 1.17–2.01× area reduction and 1.05–1.84× power reduction versus baseline MAC units. When integrated into an AI accelerator, Jack Unit achieves 1.32–5.41× energy-efficiency improvement across five benchmark workloads, significantly enhancing hardware utilization and format adaptability.

Technology Category

Application Category

📝 Abstract

In this work, we introduce an area- and energy-efficient multiply-accumulate (MAC) unit, named Jack unit, that is a jack-of-all-trades, supporting various data formats such as integer (INT), floating point (FP), and microscaling data format (MX). It provides bit-level flexibility and enhances hardware efficiency by i) replacing the carry-save multiplier (CSM) in the FP multiplier with a precision-scalable CSM, ii) performing the adjustment of significands based on the exponent differences within the CSM, and iii) utilizing 2D sub-word parallelism. To assess effectiveness, we implemented the layout of the Jack unit and three baseline MAC units. Additionally, we designed an AI accelerator equipped with our Jack units to compare with a state-of-the-art AI accelerator supporting various data formats. The proposed MAC unit occupies 1.17~2.01x smaller area and consumes 1.05~1.84x lower power compared to the baseline MAC units. On five AI benchmarks, the accelerator designed with our Jack units improves energy efficiency by 1.32~5.41x over the baseline across various data formats.

Problem

Research questions and friction points this paper is trying to address.

Designing area-efficient MAC unit for diverse data formats

Enhancing energy efficiency in AI accelerator hardware

Supporting INT, FP, MX formats with scalable precision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Precision-scalable CSM replaces FP multiplier

Significand adjustment within CSM for efficiency

2D sub-word parallelism enhances hardware flexibility

🔎 Similar Papers

JugglePAC: A Pipelined Accumulation Circuit