Efficiently Training A Flat Neural Network Before It has been Quantizated

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-bit post-training quantization (PTQ) of vision transformers suffers from severe accuracy degradation, particularly below 4 bits. Method: This paper proposes Quantization-Aware Flatness Training (QAFT)—a novel PTQ framework that first identifies the flatness of full-precision networks as a critical determinant of low-bit PTQ robustness. By modeling activation and weight quantization errors as independent Gaussian noise, QAFT injects corresponding noise during training and jointly optimizes network parameters to intrinsically adapt the model to the target quantization format. The approach requires no architectural modifications and is model-agnostic. Results: Evaluated on ImageNet, QAFT significantly reduces quantization error across 2–4-bit PTQ regimes. For ViT-B/16, it achieves a +4.2% Top-1 accuracy improvement at 4 bits compared to baseline PTQ methods. QAFT establishes a new, efficient, and broadly applicable paradigm for low-bit quantization-aware training of vision transformers.

Technology Category

Application Category

📝 Abstract
Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the relationship between a well-trained NN and the quantized model, leading to considerable quantization error for PTQ. However, it is unclear how to efficiently train a model-agnostic neural network which is tailored for a predefined precision low-bit model. In this paper, we firstly discover that a flat full precision neural network is crucial for low-bit quantization. To achieve this, we propose a framework that proactively pre-conditions the model by measuring and disentangling the error sources. Specifically, both the Activation Quantization Error (AQE) and the Weight Quantization Error (WQE) are statistically modeled as independent Gaussian noises. We study several noise injection optimization methods to obtain a flat minimum. Experimental results attest to the effectiveness of our approach. These results open novel pathways for obtaining low-bit PTQ models.
Problem

Research questions and friction points this paper is trying to address.

Reducing quantization error in vision transformers
Training model-agnostic networks for low-bit precision
Achieving flat neural networks for efficient quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training flat neural networks before quantization
Modeling quantization errors as Gaussian noises
Using noise injection for flat minimum optimization
🔎 Similar Papers
No similar papers found.