🤖 AI Summary
To address the significant accuracy degradation in data-free quantization of Vision Transformers (ViTs) caused by semantically impoverished synthetic images, this paper proposes a semantic-guided data-free low-bit quantization method. Our approach introduces three key innovations: (1) Attention Prior Alignment (APA), which enforces consistency between the attention distributions of synthetic images and the original pre-trained model; (2) Multi-Semantic Reinforcement (MSR), driven by local patch-wise optimization to enhance semantic diversity and discriminability; and (3) Adaptive Soft Labeling (SL), which improves the reliability of pseudo-labels. Evaluated on ImageNet with ViT-B under W4A4 quantization, our method achieves 78.2% top-1 accuracy—surpassing the prior state-of-the-art data-free approach by 15.52 percentage points. Notably, it is the first data-free method to exceed the accuracy of its full-precision, data-dependent counterpart at ultra-low bit-widths, thereby simultaneously ensuring privacy preservation, computational efficiency, and competitive model performance.
📝 Abstract
Data-free quantization (DFQ), which facilitates model quantization without real data to address increasing concerns about data security, has garnered significant attention within the model compression community. Recently, the unique architecture of vision transformers (ViTs) has driven the development of specialized DFQ techniques. However, we observe that the synthetic images from existing methods suffer from the deficient semantics issue compared to real images, thereby compromising performance. Motivated by this, we propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs. First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images. Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization and diverse semantics in synthetic images. Finally, SPDFQ employs Softlabel Learning (SL), where soft learning targets are adapted to encourage more complex semantics and accommodate images augmented by MSR. Experimental results demonstrate that SPDFQ significantly outperforms existing methods. For instance, SPDFQ achieves a 15.52% increase in top-1 accuracy on ImageNet for W4A4 ViT-B