INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the lack of systematic, multi-granularity comparison between low-precision floating-point (FP) and integer (INT) quantization for handling LLM activation outliers on AI hardware (e.g., Blackwell architecture). We conduct algorithm–hardware co-design, proposing three techniques: fine-grained block-wise quantization, Hadamard rotation to suppress outliers, and symmetric clipping to eliminate gradient bias in low-bit training. We首次 demonstrate that MXINT8 achieves near-lossless training at 4 bits—surpassing MXFP4/NVFP4 in both accuracy and energy efficiency. Moreover, NVINT4, when combined with outlier control, outperforms NVFP4. These findings challenge the FP-dominant paradigm in AI hardware design, establishing a unified evaluation framework for low-bit quantization and introducing an integer-first, co-optimized path for efficient LLM inference and training.

Technology Category

Application Category

📝 Abstract

Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language Models (LLMs). Despite this industry trend, a unified comparison of FP and integer (INT) quantization across varying granularities has been missing, leaving algorithm and hardware co-design without clear guidance. This paper fills that gap by systematically investigating the trade-offs between FP and INT formats. We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced. Our comprehensive comparison demonstrates that for popular 8-bit fine-grained formats (e.g., MX with block size 32), MXINT8 is superior to its FP counterpart in both algorithmic accuracy and hardware efficiency. However, for 4-bit formats, FP (e.g., MXFP4, NVFP4) often holds an accuracy advantage , though we show that NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied. We also introduce a symmetric clipping method that resolves gradient bias in fine-grained low-bit INT training, enabling nearly lossless performance for MXINT8 training. These findings challenge the current hardware trajectory, demonstrating that a one-size-fits-all FP approach is suboptimal and advocating that fine-grained INT formats, particularly MXINT8, offer a better balance of accuracy, power, and efficiency for future AI accelerators.

Problem

Research questions and friction points this paper is trying to address.

Compares FP and INT quantization formats across granularities for AI hardware

Analyzes performance crossover between coarse-grained FP and fine-grained INT formats

Challenges one-size-fits-all FP approach for future AI accelerators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically compares FP and INT quantization trade-offs across granularities

Introduces symmetric clipping method to fix gradient bias in INT training

Demonstrates fine-grained INT formats offer better accuracy-efficiency balance

🔎 Similar Papers

Quantization-aware Matrix Factorization for Low Bit Rate Image Compression