Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers

📅 2024-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant accuracy degradation in data-free quantization of Vision Transformers (ViTs) caused by semantically impoverished synthetic images, this paper proposes a semantic-guided data-free low-bit quantization method. Our approach introduces three key innovations: (1) Attention Prior Alignment (APA), which enforces consistency between the attention distributions of synthetic images and the original pre-trained model; (2) Multi-Semantic Reinforcement (MSR), driven by local patch-wise optimization to enhance semantic diversity and discriminability; and (3) Adaptive Soft Labeling (SL), which improves the reliability of pseudo-labels. Evaluated on ImageNet with ViT-B under W4A4 quantization, our method achieves 78.2% top-1 accuracy—surpassing the prior state-of-the-art data-free approach by 15.52 percentage points. Notably, it is the first data-free method to exceed the accuracy of its full-precision, data-dependent counterpart at ultra-low bit-widths, thereby simultaneously ensuring privacy preservation, computational efficiency, and competitive model performance.

Technology Category

Application Category

📝 Abstract
Data-free quantization (DFQ), which facilitates model quantization without real data to address increasing concerns about data security, has garnered significant attention within the model compression community. Recently, the unique architecture of vision transformers (ViTs) has driven the development of specialized DFQ techniques. However, we observe that the synthetic images from existing methods suffer from the deficient semantics issue compared to real images, thereby compromising performance. Motivated by this, we propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs. First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images. Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization and diverse semantics in synthetic images. Finally, SPDFQ employs Softlabel Learning (SL), where soft learning targets are adapted to encourage more complex semantics and accommodate images augmented by MSR. Experimental results demonstrate that SPDFQ significantly outperforms existing methods. For instance, SPDFQ achieves a 15.52% increase in top-1 accuracy on ImageNet for W4A4 ViT-B
Problem

Research questions and friction points this paper is trying to address.

Privacy Preservation
Vision Transformers (ViTs)
Quantization Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

SPDFQ Method
Random Attention Mechanism
Soft Label Learning
🔎 Similar Papers
No similar papers found.
Yunshan Zhong
Yunshan Zhong
Hainan university
Y
Yuyao Zhou
MAC Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University
Y
Yuxin Zhang
MAC Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University
S
Shen Li
Alibaba
Y
Yong Li
Alibaba
F
Fei Chao
MAC Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University
Zhanpeng Zeng
Zhanpeng Zeng
University of Wisconsin Madison
Transformer Efficiency
R
Rongrong Ji
Institute of Artificial Intelligence, Xiamen University; MAC Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University