LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in visual Transformer quantization—coarse granularity, inconsistent metric scales across components (e.g., attention vs. MLP), and suboptimal bit-width allocation—this paper proposes LampQ, a layer-wise mixed-precision quantization framework. Methodologically, LampQ (1) introduces a type-aware Fisher information-based sensitivity metric to unify quantization sensitivity evaluation across heterogeneous modules; (2) formulates an integer linear programming model to jointly optimize bit-width assignment per layer; and (3) integrates quantization-aware training with iterative bit-width refinement for fine-grained, task-adaptive layer-wise quantization. Evaluated on image classification, object detection, and zero-shot quantization tasks, LampQ achieves state-of-the-art accuracy under significant reductions in computational and memory costs. It effectively balances compression ratio and model fidelity, preserving high accuracy while enabling efficient deployment.

Technology Category

Application Category

📝 Abstract
How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation demands with minimal accuracy degradation. However, existing methods rely on uniform precision, ignoring the diverse sensitivity of ViT components to quantization. Metric-based Mixed Precision Quantization (MPQ) is a promising alternative, but previous MPQ methods for ViTs suffer from three major limitations: 1) coarse granularity, 2) mismatch in metric scale across component types, and 3) quantization-unaware bit allocation. In this paper, we propose LampQ (Layer-wise Mixed Precision Quantization for Vision Transformers), an accurate metric-based MPQ method for ViTs to overcome these limitations. LampQ performs layer-wise quantization to achieve both fine-grained control and efficient acceleration, incorporating a type-aware Fisher-based metric to measure sensitivity. Then, LampQ assigns bit-widths optimally through integer linear programming and further updates them iteratively. Extensive experiments show that LampQ provides the state-of-the-art performance in quantizing ViTs pre-trained on various tasks such as image classification, object detection, and zero-shot quantization.
Problem

Research questions and friction points this paper is trying to address.

Accurately quantizing pre-trained Vision Transformers with minimal accuracy loss
Overcoming uniform precision limitations in Vision Transformer quantization methods
Addressing granularity and metric mismatch in mixed precision quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise mixed precision quantization for Vision Transformers
Type-aware Fisher-based metric measures sensitivity accurately
Integer linear programming optimizes bit-width allocation iteratively
🔎 Similar Papers
No similar papers found.
M
Minjun Kim
Department of Computer Science and Engineering, Seoul National University
J
Jaeri Lee
Interdisciplinary Program in Artificial Intelligence, Seoul National University
Jongjin Kim
Jongjin Kim
서울대학교 데이터마이닝 연구실
recommendation system
J
Jeongin Yun
Interdisciplinary Program in Artificial Intelligence, Seoul National University
Y
Yongmo Kwon
Interdisciplinary Program in Artificial Intelligence, Seoul National University
U
U. Kang
Department of Computer Science and Engineering, Seoul National University, Interdisciplinary Program in Artificial Intelligence, Seoul National University