Metis: Training Large Language Models with Advanced Low-Bit Quantization

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
During low-bit quantized training of large language models, the anisotropic parameter distribution leads to excessively wide singular value spectra, which conflicts with the inherent bias of block-wise quantization and causes training instability and performance degradation. To address this, we propose a spectral-domain decoupling mechanism: (i) spectral decomposition coupled with random embedding to separate dominant and tail components; (ii) frequency-domain adaptive learning rates that dynamically scale update steps according to spectral energy; and (iii) dual-range regularization that separately constrains quantization errors in both dominant components and residuals. Our method is the first to achieve, within standard training pipelines, FP8-quantized models that surpass FP32 baselines in accuracy and FP4-quantized models matching FP32 performance. It significantly improves training stability, convergence speed, and scalability across bit-widths.

Technology Category

Application Category

📝 Abstract
This work identifies anisotropic parameter distributions as a fundamental barrier to training large language models (LLMs) with low-bit quantization: a few dominant singular values create wide numerical ranges that conflict with the inherent bias of block-wise quantization. This bias disproportionately preserves high-magnitude values while discarding smaller ones, causing training instability and low model performance. This work introduces Metis, a training framework that combines (i) spectral decomposition with random embedding to efficiently disentangle dominant from long-tail components, compressing broad distributions into quantization-friendly narrow ranges; (ii) adaptive learning rates in the spectral domain to amplify underrepresented directions and better capture diverse features critical for performance; and (iii) a dual-range regularizer that jointly constrains numerical precision and parameter range distribution, ensuring stable, unbiased low-bit training. With Metis, FP8 training surpasses FP32 baselines, and FP4 training achieves accuracy comparable to FP32, paving the way for robust and scalable LLM training under advanced low-bit quantization. The code implementation for Metis is available at: https://github.com/typename-yyf/Metis-quantization.
Problem

Research questions and friction points this paper is trying to address.

Anisotropic distributions hinder low-bit LLM quantization
Block-wise quantization bias causes training instability
Dominant singular values conflict with quantization ranges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral decomposition with random embedding
Adaptive learning rates in spectral domain
Dual-range regularizer for numerical precision
🔎 Similar Papers
No similar papers found.
H
Hengjie Cao
Fudan University
M
Mengyi Chen
Fudan University
Yifeng Yang
Yifeng Yang
Department of Computer Science, Shanghai Jiaotong University
Machine Learning
R
Ruijun Huang
Fudan University
Fang Dong
Fang Dong
Southeast University
Edge CompuingCloudAIOT
J
Jixian Zhou
Fudan University
A
Anrui Chen
Fudan University
Mingzhi Dong
Mingzhi Dong
University of Bath
Y
Yujiang Wang
Oxford Suzhou Centre for Advanced Research
Jinlong Hou
Jinlong Hou
Shanghai Innovation Institute (SII)
machine learningdeep learninghigh performance computingdrug discoverymedical
Y
Yuan Cheng
Shanghai Innovation Institute
F
Fan Wu
Huawei
F
Fan Yang
Fudan University
T
Tun Lu
Fudan University
N
Ning Gu
Fudan University
L
Li Shang
Fudan University