ZeroPur: Succinct Training-Free Adversarial Purification

📅 2024-06-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing adversarial purification methods rely on external generative models or auxiliary modules, necessitating domain-specific retraining and incurring substantial computational overhead. To address this, we propose ZeroPur—the first fully training-free adversarial purification method that leverages only the victim classifier for robust defense. Grounded in the manifold extrapolation hypothesis, ZeroPur employs gradient-guided offset and direction-momentum-driven adaptive projection to pull adversarial samples back toward the natural image manifold along the classifier’s embedding-space gradient direction, augmented by blur-guided stabilization. Crucially, ZeroPur introduces no additional trainable parameters or external models. Extensive evaluations on CIFAR-10/100 and ImageNet-1K demonstrate state-of-the-art robust accuracy across diverse architectures—including ResNet and WideResNet—significantly outperforming existing training-free and lightweight-training baselines.

Technology Category

Application Category

📝 Abstract
Adversarial purification is a kind of defense technique that can defend various unseen adversarial attacks without modifying the victim classifier. Existing methods often depend on external generative models or cooperation between auxiliary functions and victim classifiers. However, retraining generative models, auxiliary functions, or victim classifiers relies on the domain of the fine-tuned dataset and is computation-consuming. In this work, we suppose that adversarial images are outliers of the natural image manifold and the purification process can be considered as returning them to this manifold. Following this assumption, we present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur. ZeroPur contains two steps: given an adversarial example, Guided Shift obtains the shifted embedding of the adversarial example by the guidance of its blurred counterparts; after that, Adaptive Projection constructs a directional vector by this shifted embedding to provide momentum, projecting adversarial images onto the manifold adaptively. ZeroPur is independent of external models and requires no retraining of victim classifiers or auxiliary functions, relying solely on victim classifiers themselves to achieve purification. Extensive experiments on three datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) using various classifier architectures (ResNet, WideResNet) demonstrate that our method achieves state-of-the-art robust performance. The code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Adversarial purification defends against unseen attacks without modifying classifiers
Existing methods rely on external models or retraining, which is computationally expensive
ZeroPur purifies adversarial images without training, using shifted embeddings and adaptive projection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free adversarial purification method
Guided Shift and Adaptive Projection steps
No external models or retraining required
🔎 Similar Papers
No similar papers found.