🤖 AI Summary
To address the challenge of fine-grained backdoor triggers in data poisoning attacks—whose removal is difficult and often compromises semantic fidelity—this paper proposes a defense method based on a vector-quantized bottleneck. Specifically, it jointly models a Vector-Quantized Variational Autoencoder (VQ-VAE) with a GAN discriminator to perform semantics-preserving quantization-based purification of poisoned images. This approach disrupts trigger patterns while enforcing output adherence to the natural image distribution. Evaluated on CIFAR-10, the method achieves a zero percent poisoning success rate (PSR), maintains clean-accuracy between 91% and 95%, and accelerates inference by over 50× compared to diffusion-based defenses. The framework thus simultaneously delivers strong robustness against backdoor attacks, high-fidelity reconstruction, and superior computational efficiency.
📝 Abstract
We introduce PureVQ-GAN, a defense against data poisoning that forces backdoor triggers through a discrete bottleneck using Vector-Quantized VAE with GAN discriminator. By quantizing poisoned images through a learned codebook, PureVQ-GAN destroys fine-grained trigger patterns while preserving semantic content. A GAN discriminator ensures outputs match the natural image distribution, preventing reconstruction of out-of-distribution perturbations. On CIFAR-10, PureVQ-GAN achieves 0% poison success rate (PSR) against Gradient Matching and Bullseye Polytope attacks, and 1.64% against Narcissus while maintaining 91-95% clean accuracy. Unlike diffusion-based defenses requiring hundreds of iterative refinement steps, PureVQ-GAN is over 50x faster, making it practical for real training pipelines.