TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address severe accuracy degradation in 4-bit fully quantized training of large language models (LLMs) caused by weight oscillation and outliers, this paper proposes an end-to-end NVFP4-based full-quantization training framework. The method introduces three core innovations: (1) dual-block unbiased quantization to mitigate cumulative quantization bias; (2) OsciReset, a dynamic oscillation suppression algorithm that stabilizes weight updates; and (3) OutControl, an outlier management mechanism enabling adaptive high-precision preservation of gradients and activations. The framework uniformly applies 4-bit NVFP4 quantization to weights, activations, and gradients, integrated with error compensation and dynamic outlier handling. Experiments on a 370M-parameter model trained over a 200B-token dataset demonstrate that our approach reduces the performance gap with full-precision training by 51.3% on average compared to existing FP4 methods, significantly improving training stability and convergence under ultra-low-precision conditions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) training is prohibitively expensive, driving interest in low-precision fully-quantized training (FQT). While novel 4-bit formats like NVFP4 offer substantial efficiency gains, achieving near-lossless training at such low precision remains challenging. We introduce TetraJet-v2, an end-to-end 4-bit FQT method that leverages NVFP4 for activations, weights, and gradients in all linear layers. We identify two critical issues hindering low-precision LLM training: weight oscillation and outliers. To address these, we propose: 1) an unbiased double-block quantization method for NVFP4 linear layers, 2) OsciReset, an algorithm to suppress weight oscillation, and 3) OutControl, an algorithm to retain outlier accuracy. TetraJet-v2 consistently outperforms prior FP4 training methods on pre-training LLMs across varying model sizes up to 370M and data sizes up to 200B tokens, reducing the performance gap to full-precision training by an average of 51.3%.

Problem

Research questions and friction points this paper is trying to address.

Achieving near-lossless LLM training with 4-bit quantization

Addressing weight oscillation issues in low-precision model training

Controlling outlier effects to maintain accuracy in quantized models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unbiased double-block NVFP4 quantization for linear layers

OsciReset algorithm suppresses weight oscillation issues

OutControl algorithm maintains accuracy for outlier data

🔎 Similar Papers

Spike No More: Stabilizing the Pre-training of Large Language Models