Starting Positions Matter: A Study on Better Weight Initialization for Neural Network Quantization

πŸ“… 2025-06-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the severe accuracy degradation of low-bit quantized models in quantization-aware training (QAT) caused by suboptimal initial weights. We present the first systematic analysis revealing the decisive role of weight initialization in quantization robustness. To this end, we propose GHN-QATβ€”a graph hypernetwork-driven, quantization-aware initialization framework and the first parameter-predictive initialization method explicitly designed for quantization. Leveraging a graph hypernetwork (GHN), GHN-QAT learns the mapping from network architecture to optimal initialization parameters, generating quantization-adapted weights directly after floating-point pretraining. Experiments demonstrate that GHN-QAT significantly improves model accuracy under 4-bit quantization and remains consistently superior to random initialization even under highly challenging 2-bit quantization. This work establishes a new paradigm for efficient and robust training of low-bit AI models on edge devices.

Technology Category

Application Category

πŸ“ Abstract
Deep neural network (DNN) quantization for fast, efficient inference has been an important tool in limiting the cost of machine learning (ML) model inference. Quantization-specific model development techniques such as regularization, quantization-aware training, and quantization-robustness penalties have served to greatly boost the accuracy and robustness of modern DNNs. However, very little exploration has been done on improving the initial conditions of DNN training for quantization. Just as random weight initialization has been shown to significantly impact test accuracy of floating point models, it would make sense that different weight initialization methods impact quantization robustness of trained models. We present an extensive study examining the effects of different weight initializations on a variety of CNN building blocks commonly used in efficient CNNs. This analysis reveals that even with varying CNN architectures, the choice of random weight initializer can significantly affect final quantization robustness. Next, we explore a new method for quantization-robust CNN initialization -- using Graph Hypernetworks (GHN) to predict parameters of quantized DNNs. Besides showing that GHN-predicted parameters are quantization-robust after regular float32 pretraining (of the GHN), we find that finetuning GHNs to predict parameters for quantized graphs (which we call GHN-QAT) can further improve quantized accuracy of CNNs. Notably, GHN-QAT shows significant accuracy improvements for even 4-bit quantization and better-than-random accuracy for 2-bits. To the best of our knowledge, this is the first in-depth study on quantization-aware DNN weight initialization. GHN-QAT offers a novel approach to quantized DNN model design. Future investigations, such as using GHN-QAT-initialized parameters for quantization-aware training, can further streamline the DNN quantization process.
Problem

Research questions and friction points this paper is trying to address.

Study impact of weight initialization on DNN quantization robustness
Explore Graph Hypernetworks for quantization-robust CNN initialization
Improve quantized accuracy using GHN-QAT for low-bit DNNs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Study impact of weight initialization on quantization robustness
Use Graph Hypernetworks for quantized DNN parameter prediction
GHN-QAT improves accuracy in low-bit quantization
πŸ”Ž Similar Papers
No similar papers found.