Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-Precision Quantized Multiplication on Hardware Accelerators

📅 2025-04-23

🏛️ IEEE International Symposium on Quality Electronic Design

📈 Citations: 1

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing hardware multipliers struggle to efficiently support the dynamic precision requirements of mixed-precision quantized neural networks at runtime, leading to a trade-off between resource utilization and model accuracy. To address this challenge, this work proposes a runtime-reconfigurable, multi-precision, multi-channel bit-level systolic array architecture that, for the first time, enables dynamic inter-layer mixed-precision multiplication. By integrating bit-level systolic arrays, runtime reconfiguration mechanisms, and multi-channel parallel processing, the proposed architecture achieves 1.32–3.57× inference acceleration on an Ultra96 FPGA, with reduced critical-path delay and support for operating frequencies up to 250 MHz. This design significantly enhances model accuracy adaptability while maintaining high hardware efficiency.

Technology Category

Application Category

📝 Abstract

Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185× to 3.5671× speedup in inferring mixed-precision models and has less critical path delay, supporting higher clock frequency (250MHz).

Problem

Research questions and friction points this paper is trying to address.

multi-precision

runtime reconfigurable

quantized neural network

hardware accelerator

bitwise systolic array

Innovation

Methods, ideas, or system contributions that make the work stand out.

runtime-reconfigurable

multi-precision

bitwise systolic array