Efficiently Training A Flat Neural Network Before It has been Quantizated

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Low-bit post-training quantization (PTQ) of vision transformers suffers from severe accuracy degradation, particularly below 4 bits. Method: This paper proposes Quantization-Aware Flatness Training (QAFT)—a novel PTQ framework that first identifies the flatness of full-precision networks as a critical determinant of low-bit PTQ robustness. By modeling activation and weight quantization errors as independent Gaussian noise, QAFT injects corresponding noise during training and jointly optimizes network parameters to intrinsically adapt the model to the target quantization format. The approach requires no architectural modifications and is model-agnostic. Results: Evaluated on ImageNet, QAFT significantly reduces quantization error across 2–4-bit PTQ regimes. For ViT-B/16, it achieves a +4.2% Top-1 accuracy improvement at 4 bits compared to baseline PTQ methods. QAFT establishes a new, efficient, and broadly applicable paradigm for low-bit quantization-aware training of vision transformers.

Technology Category

Application Category

📝 Abstract

Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the relationship between a well-trained NN and the quantized model, leading to considerable quantization error for PTQ. However, it is unclear how to efficiently train a model-agnostic neural network which is tailored for a predefined precision low-bit model. In this paper, we firstly discover that a flat full precision neural network is crucial for low-bit quantization. To achieve this, we propose a framework that proactively pre-conditions the model by measuring and disentangling the error sources. Specifically, both the Activation Quantization Error (AQE) and the Weight Quantization Error (WQE) are statistically modeled as independent Gaussian noises. We study several noise injection optimization methods to obtain a flat minimum. Experimental results attest to the effectiveness of our approach. These results open novel pathways for obtaining low-bit PTQ models.

Problem

Research questions and friction points this paper is trying to address.

Reducing quantization error in vision transformers

Training model-agnostic networks for low-bit precision

Achieving flat neural networks for efficient quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training flat neural networks before quantization

Modeling quantization errors as Gaussian noises

Using noise injection for flat minimum optimization

🔎 Similar Papers

No similar papers found.