OMPQ: Orthogonal Mixed Precision Quantization

📅 2021-09-16
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 45
Influential: 5
📄 PDF
🤖 AI Summary
To address the computationally expensive search process for bit-width allocation in mixed-precision quantization, this paper proposes a search-free post-training quantization method. It employs the orthogonality of layer-wise weight matrices as a differentiable proxy metric to directly guide bit-width assignment across layers. By formulating the problem as a linear program and optimizing orthogonality-based objectives, the approach avoids conventional integer programming and relaxation approximations—enabling zero-iteration, low-data-dependency mixed-precision quantization. Evaluated on ResNet-18 and MobileNetV2, it achieves 72.08% Top-1 accuracy (6.7 MB) and 71.27% Top-1 accuracy (1.5 MB), respectively, while reducing search overhead and calibration data requirements by several orders of magnitude. This work is the first to cast orthogonality as a tractable optimization objective for mixed-precision quantization, significantly improving deployment efficiency and hardware adaptability.
📝 Abstract
To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. However, existing approaches rely heavily on an extremely time-consuming search process and various relaxations when seeking the optimal bit configuration. To address this issue, we propose to optimize a proxy metric of network orthogonality that can be efficiently solved with linear programming, which proves to be highly correlated with quantized model accuracy and bit-width. Our approach significantly reduces the search time and the required data amount by orders of magnitude, but without a compromise on quantization accuracy. Specifically, we achieve 72.08% Top-1 accuracy on ResNet-18 with 6.7Mb parameters, which does not require any searching iterations. Given the high efficiency and low data dependency of our algorithm, we use it for the post-training quantization, which achieves 71.27% Top-1 accuracy on MobileNetV2 with only 1.5Mb parameters.
Problem

Research questions and friction points this paper is trying to address.

Solving mixed precision quantization via orthogonal proxy optimization
Reducing search time and data dependency in quantization
Achieving efficient post-training quantization with high accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes network orthogonality via linear programming
Reduces search time and data needs significantly
Enables efficient post-training mixed precision quantization
🔎 Similar Papers
No similar papers found.
Y
Yuexiao Ma
Media Analytics and Computing Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University, China
Taisong Jin
Taisong Jin
Assistant Professor of Computer, Xiamen University
Graph Neural Network
Xiawu Zheng
Xiawu Zheng
Associate Professor, IEEE Senior Member, Xiamen University
Automated Machine LearningNetwork CompressionNeural Architecture SearchAutoML
Y
Yan Wang
Samsara, Seattle, WA, USA
H
Huixia Li
Media Analytics and Computing Lab, Department of Computer Science and Technology, School of Informatics, Xiamen University, China
Guannan Jiang
Guannan Jiang
CATL;UNSW;XMU
Image ProcessingComputer VisionOptimization
W
Wei Zhang
CATL, China
R
Rongrong Ji
Media Analytics and Computing Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University, China