🤖 AI Summary
Ultra-low-precision quantization (e.g., FP4) in large language model (LLM) training suffers from severe quantization errors and activation collapse, undermining training stability and convergence.
Method: This paper introduces the first FP4 quantization framework tailored for LLM training, featuring a differentiable quantizer, an outlier truncation-and-compensation mechanism, mixed-precision scheduling, and vector-wise quantization to enhance numerical stability.
Contribution/Results: We successfully train a 13B-parameter LLM on over 100 billion tokens using FP4 precision, achieving final accuracy on par with BF16 and FP8 baselines. Our framework enables efficient, stable FP4 training—setting a new state of the art—and provides a scalable algorithmic foundation for next-generation ultra-low-precision hardware, advancing LLM training toward extreme energy efficiency.
📝 Abstract
The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.