QuEPT: Quantized Elastic Precision Transformers with One-Shot Calibration for Multi-Bit Switching

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF

Technology Category

Application Category

📝 Abstract
Elastic precision quantization enables multi-bit deployment via a single optimization pass, fitting diverse quantization scenarios.Yet, the high storage and optimization costs associated with the Transformer architecture, research on elastic quantization remains limited, particularly for large language models.This paper proposes QuEPT, an efficient post-training scheme that reconstructs block-wise multi-bit errors with one-shot calibration on a small data slice. It can dynamically adapt to various predefined bit-widths by cascading different low-rank adapters, and supports real-time switching between uniform quantization and mixed precision quantization without repeated optimization. To enhance accuracy and robustness, we introduce Multi-Bit Token Merging (MB-ToMe) to dynamically fuse token features across different bit-widths, improving robustness during bit-width switching. Additionally, we propose Multi-Bit Cascaded Low-Rank adapters (MB-CLoRA) to strengthen correlations between bit-width groups, further improve the overall performance of QuEPT. Extensive experiments demonstrate that QuEPT achieves comparable or better performance to existing state-of-the-art post-training quantization methods.Our code is available at https://github.com/xuke225/QuEPT
Problem

Research questions and friction points this paper is trying to address.

elastic precision quantization
multi-bit switching
Transformer
post-training quantization
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

elastic precision quantization
one-shot calibration
multi-bit switching
low-rank adapters
post-training quantization
🔎 Similar Papers
No similar papers found.
Ke Xu
Ke Xu
Anhui University
Deep LearningNetwork QuantizationNetwork PruningNeural Architecture SearchFPGA
Y
Yixin Wang
School of Artificial Intelligence, Anhui University, Hefei, China
Z
Zhongcheng Li
iFLYTEK Research, Hefei, China
Hao Cui
Hao Cui
University of California, Irvine
privacy policyimage watermarking
J
Jinshui Hu
iFLYTEK Research, Hefei, China
Xingyi Zhang
Xingyi Zhang
MBZUAI
graph representation learningAI4Sciencegeometric deep learning