BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models

๐Ÿ“… 2025-05-26
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing rotation-based quantization methods suffer from two critical flaws: (i) channel-wise mean shifts after rotation, which widen quantization ranges and exacerbate rounding errors; and (ii) excessive Gaussianization of activation distributions, intensifying clipping-induced energy loss. This work is the first to formally identify and analyze these mechanisms. We propose a synergistic optimization framework integrating bias correction and layer-adaptive asymmetric scaling, supporting block-wise training to reduce GPU memory overhead. Our method comprises three core components: rotation matrix preprocessing, channel-level bias compensation, and dynamic asymmetric quantization scalingโ€”enabling source-level error suppression and decoupled training. Evaluated across multiple large language models and standard benchmarks, our approach reduces accuracy degradation by 50.5%, 42.9%, and 29.2% relative to QuaRot, SpinQuant, and OSTQuant, respectively. It significantly improves quantization-aware fine-tuning efficiency while enhancing memory efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Rotations have become essential to state-of-the-art quantization pipelines for large language models (LLMs) by effectively smoothing outliers in weights and activations. However, further optimizing the rotation parameters offers only limited performance gains and introduces significant training overhead: due to rotation parameter sharing, full-model must be loaded simultaneously to enable backpropagation, resulting in substantial memory consumption and limited practical utility. In this work, we identify two fundamental limitations of current rotational quantization methods: (i) rotation fails to align channel means, resulting in wider quantization bounds and increased rounding errors; and (ii) rotation makes the activation distribution more Gaussian-like, increasing energy loss caused by clipping errors. To address these issues, we introduce extbf{BASE-Q}, a simple yet powerful approach that combines bias correction and asymmetric scaling to effectively reduce rounding and clipping errors. Furthermore, BASE-Q enables blockwise optimization, eliminating the need for memory-intensive full-model backpropagation. Extensive experiments on various LLMs and benchmarks demonstrate the effectiveness of BASE-Q, narrowing the accuracy gap to full-precision models by 50.5%, 42.9%, and 29.2% compared to QuaRot, SpinQuant, and OSTQuant, respectively. The code will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Reduces rounding errors from unaligned channel means in quantization
Minimizes clipping errors from Gaussian-like activation distributions
Enables blockwise optimization to avoid memory-intensive full-model training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bias correction reduces rounding errors
Asymmetric scaling minimizes clipping errors
Blockwise optimization avoids full-model backpropagation
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Liulu He
Nanjing University
S
Shenli Zhen
Nanjing University
K
Karwei Sun
Nanjing University
Yijiang Liu
Yijiang Liu
PhD
Machine Learning Efficiency
Y
Yufei Zhao
Nanjing University
C
Chongkang Tan
Alibaba Group
Huanrui Yang
Huanrui Yang
Assistant Professor, ECE, University of Arizona
Efficient deep learningTrustworthy deep learning
Y
Yuan Du
Nanjing University
L
Li Du
Nanjing University