Low-bit Model Quantization for Deep Neural Networks: A Survey

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying deep neural networks (DNNs) faces challenges from high computational overhead and large model sizes. While low-bit weight quantization accelerates inference and reduces memory bandwidth requirements, it often incurs substantial accuracy degradation. This paper presents a systematic survey of low-bit weight quantization research from 2019 to 2024. We propose the first unified taxonomy comprising eight major categories and 24 subcategories—covering linear/nonlinear quantization, layer-wise/channel-wise calibration, retraining-free and fine-tuning-based paradigms, gradient approximation techniques, and mixed-precision search strategies. Through structured comparative analysis of over 100 state-of-the-art works, we identify common bottlenecks, clarify promising future directions, and highlight open challenges. To foster reproducibility and industrial adoption, we open-source Awesome-Model-Quantization—a curated, continuously updated resource repository—thereby advancing standardization and practical deployment of quantization techniques.

Technology Category

Application Category

📝 Abstract
With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an effective weight-lighting technique, has become an indispensable procedure in the whole deployment pipeline. The essence of quantization acceleration is the conversion from continuous floating-point numbers to discrete integer ones, which significantly speeds up the memory I/O and calculation, i.e., addition and multiplication. However, performance degradation also comes with the conversion because of the loss of precision. Therefore, it has become increasingly popular and critical to investigate how to perform the conversion and how to compensate for the information loss. This article surveys the recent five-year progress towards low-bit quantization on DNNs. We discuss and compare the state-of-the-art quantization methods and classify them into 8 main categories and 24 sub-categories according to their core techniques. Furthermore, we shed light on the potential research opportunities in the field of model quantization. A curated list of model quantization is provided at https://github.com/Kai-Liu001/Awesome-Model-Quantization.
Problem

Research questions and friction points this paper is trying to address.

Reducing computation costs and model sizes in DNN deployment
Minimizing performance degradation from low-bit quantization
Surveying and classifying state-of-the-art quantization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts floating-point numbers to discrete integers
Compensates for precision loss in quantization
Classifies quantization methods into 8 categories
🔎 Similar Papers
No similar papers found.
K
Kai Liu
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Q
Qian Zheng
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
K
Kaiwen Tao
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Zhiteng Li
Zhiteng Li
Shanghai Jiao Tong University
Large Language ModelsModel CompressionComputer Vision
Haotong Qin
Haotong Qin
ETH Zürich
TinyMLModel CompressionComputer VisionDeep Learning
Wenbo Li
Wenbo Li
The Chinese University of Hong Kong
Computer VisionDeep Learning
Y
Yong Guo
Huawei Consumer Business Group, China
X
Xianglong Liu
Beihang University, China
Linghe Kong
Linghe Kong
Shanghai Jiao Tong University
Internet of ThingsMobile computingBig data
Guihai Chen
Guihai Chen
Professor of Computer Science
Computer Science and Technology
Y
Yulun Zhang
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
X
Xiaokang Yang
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China