Metis: Training Large Language Models with Advanced Low-Bit Quantization

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

During low-bit quantized training of large language models, the anisotropic parameter distribution leads to excessively wide singular value spectra, which conflicts with the inherent bias of block-wise quantization and causes training instability and performance degradation. To address this, we propose a spectral-domain decoupling mechanism: (i) spectral decomposition coupled with random embedding to separate dominant and tail components; (ii) frequency-domain adaptive learning rates that dynamically scale update steps according to spectral energy; and (iii) dual-range regularization that separately constrains quantization errors in both dominant components and residuals. Our method is the first to achieve, within standard training pipelines, FP8-quantized models that surpass FP32 baselines in accuracy and FP4-quantized models matching FP32 performance. It significantly improves training stability, convergence speed, and scalability across bit-widths.

Technology Category

Application Category

📝 Abstract

This work identifies anisotropic parameter distributions as a fundamental barrier to training large language models (LLMs) with low-bit quantization: a few dominant singular values create wide numerical ranges that conflict with the inherent bias of block-wise quantization. This bias disproportionately preserves high-magnitude values while discarding smaller ones, causing training instability and low model performance. This work introduces Metis, a training framework that combines (i) spectral decomposition with random embedding to efficiently disentangle dominant from long-tail components, compressing broad distributions into quantization-friendly narrow ranges; (ii) adaptive learning rates in the spectral domain to amplify underrepresented directions and better capture diverse features critical for performance; and (iii) a dual-range regularizer that jointly constrains numerical precision and parameter range distribution, ensuring stable, unbiased low-bit training. With Metis, FP8 training surpasses FP32 baselines, and FP4 training achieves accuracy comparable to FP32, paving the way for robust and scalable LLM training under advanced low-bit quantization. The code implementation for Metis is available at: https://github.com/typename-yyf/Metis-quantization.

Problem

Research questions and friction points this paper is trying to address.

Anisotropic distributions hinder low-bit LLM quantization

Block-wise quantization bias causes training instability

Dominant singular values conflict with quantization ranges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral decomposition with random embedding

Adaptive learning rates in spectral domain

Dual-range regularizer for numerical precision

🔎 Similar Papers

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms