FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the low energy efficiency and implementation complexity of FP-INT mixed-precision GEMM for large language model (LLM) weight quantization on general-purpose hardware, this paper proposes a lookup table (LUT)-based domain-specific accelerator architecture. We innovatively design a half-size LUT structure, a weight-pattern-driven precomputation indexing mechanism, and configurable decoding/multiplexing circuits—overcoming traditional memory constraints on LUT size and access latency, while enabling unified hardware support for multiple precisions and quantization schemes. Experimental results demonstrate a 59% improvement in energy efficiency (TOPS/W) and a 20% reduction in perplexity at 3-bit weight quantization. At equivalent perplexity, the accelerator achieves 198% higher energy efficiency than state-of-the-art accelerators and, for the first time, enables efficient computation with 2.4-bit weights.

Technology Category

Application Category

📝 Abstract

Weight-only quantization has emerged as a promising solution to the deployment challenges of large language models (LLMs). However, it necessitates FP-INT operations, which make implementation on general-purpose hardware like GPUs difficult. In this paper, we propose FIGLUT, an efficient look-up table (LUT)-based GEMM accelerator architecture. Instead of performing traditional arithmetic operations, FIGLUT retrieves precomputed values from an LUT based on weight patterns, significantly reducing the computational complexity. We also introduce a novel LUT design that addresses the limitations of conventional memory architectures. To further improve LUT-based operations, we propose a half-size LUT combined with a dedicated decoding and multiplexing unit. FIGLUT efficiently supports different bit precisions and quantization methods using a single fixed hardware configuration. For the same 3-bit weight precision, FIGLUT demonstrates 59% higher TOPS/W and 20% lower perplexity than state-of-the-art accelerator designs. When targeting the same perplexity, FIGLUT achieves 98% higher TOPS/W by performing 2.4-bit operations.

Problem

Research questions and friction points this paper is trying to address.

Efficient FP-INT GEMM operations for LLMs

Overcoming GPU limitations with LUT-based design

Supporting multiple bit precisions with fixed hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

LUT-based GEMM accelerator reduces computational complexity

Novel LUT design overcomes memory architecture limitations

Half-size LUT with decoding unit enhances efficiency

🔎 Similar Papers

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference