🤖 AI Summary
To address the energy-efficiency bottleneck of neural networks—particularly large language models—under FP8 low-precision computation, this work proposes and implements the first FPGA-accelerated, low-power approximate FP8 multiplier. Methodologically, it employs the L-Mul algorithm—a linear-complexity approximate multiplication scheme—and innovatively maps it onto Xilinx UltraScale+ FPGAs by leveraging dynamically reconfigurable LUTs and carry-chain primitives, thereby avoiding the area and power overheads inherent in conventional exact multipliers. Experimental evaluation demonstrates that the design reduces multiplier energy consumption by over 40% on average and cuts area usage by approximately 60%, while preserving >99% Top-1 accuracy across multiple neural network inference tasks. This work establishes a novel, hardware-deployable paradigm for energy-efficient approximate computing in FP8 accelerators.
📝 Abstract
Multiplication is a core operation in modern neural network (NN) computations, contributing significantly to energy consumption. The linear-complexity multiplication (L-Mul) algorithm is specifically proposed as an approximate multiplication method for emerging NN models, such as large language model (LLM), to reduce the energy consumption and computational complexity of multiplications. However, hardware implementation designs for L-Mul have not yet been reported. Additionally, 8-bit floating-point (FP8), as an emerging data format, offers a better dynamic range compared to traditional 8-bit integer (INT8), making it increasingly popular and widely adopted in NN computations. This paper thus presents a power-efficient FPGAbased hardware implementation (approximate FP8 multiplier) for L-Mul. The core computation is implemented using the dynamic reconfigurable lookup tables and carry chains primitives available in AMD Xilinx UltraScale/UltraScale+ technology. The accuracy and resource utilization of the approximate multiplier are evaluated and analyzed. Furthermore, the FP8 approximate multiplier is deployed in the inference phase of representative NN models to validate its effectiveness.