SingleQuant: Efficient Quantization of Large Language Models in a Single Pass

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM quantization methods suffer from coupled gradient optimization and quantization truncation, leading to nonsmoothness and gradient noise when applying the straight-through estimator (STE) on the Stiefel manifold—causing slow convergence, instability, and performance degradation. This paper proposes SingleQuant, the first single-pass efficient quantization framework. Its core innovation is the decoupling of truncation and optimization via two novel orthogonal transformations: Alignment Rotation Transformation (ART) and Uniformity Rotation Transformation (URT). Leveraging Givens rotations, these enable closed-form optimal distribution reshaping, eliminating STE-related artifacts. Integrated with manifold-aware optimization and geometric control, SingleQuant ensures smooth, stable, and rapid training. Evaluated on models ranging from 7B to 70B parameters, it significantly outperforms baselines—achieving a 1400× speedup on LLaMA-2-13B and an average +0.57% improvement across standard benchmarks.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantization time and degrades LLMs' task performance. Our studies confirm that Straight-Through Estimator (STE) on Stiefel manifolds introduce non-smoothness and gradient noise, obstructing optimization convergence and blocking high-fidelity quantized LLM development despite extensive training. To tackle the above limitations, we propose SingleQuant, a single-pass quantization framework that decouples from quantization truncation, thereby eliminating the above non-smoothness and gradient noise factors. Specifically, SingleQuant constructs Alignment Rotation Transformation (ART) and Uniformity Rotation Transformation (URT) targeting distinct activation outliers, where ART achieves smoothing of outlier values via closed-form optimal rotations, and URT reshapes distributions through geometric mapping. Both matrices comprise strictly formulated Givens rotations with predetermined dimensions and rotation angles, enabling promising LLMs task performance within a short time. Experimental results demonstrate SingleQuant's superiority over the selected baselines across diverse tasks on 7B-70B LLMs. To be more precise, SingleQuant enables quantized LLMs to achieve higher task performance while necessitating less time for quantization. For example, when quantizing LLaMA-2-13B, SingleQuant achieves 1,400$ imes$ quantization speedup and increases +0.57% average task performance compared to the selected best baseline.
Problem

Research questions and friction points this paper is trying to address.

Addresses convergence issues in LLM quantization methods
Proposes a single-pass framework to eliminate gradient noise
Enhances quantization speed and model task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-pass quantization framework decouples from truncation
Alignment and Uniformity Rotation Transformations target activation outliers
Givens rotations with predetermined dimensions enable fast quantization
🔎 Similar Papers
No similar papers found.
J
Jinying Xiao
National University of Defense Technology
B
Bin Ji
National University of Defense Technology
S
Shasha Li
National University of Defense Technology
X
Xiaodong Liu
National University of Defense Technology
M
Ma Jun
National University of Defense Technology
Y
Ye Zhong
National University of Defense Technology
W
Wei Li
National University of Defense Technology
Xuan Xie
Xuan Xie
Macau University of Science and Technology
Trustworthy LLMCyber Physical SystemNeural Network Verification
Qingbo Wu
Qingbo Wu
University of Electronic Science and Technology of China
video codingimage and video quality assessment
J
Jie Yu
National University of Defense Technology