🤖 AI Summary
Existing diffusion models (DMs) suffer from three critical limitations in adversarial purification: (1) inherent vulnerability of pre-trained DMs to attacks, (2) severe semantic distortion during the reverse process, and (3) aggravated trade-off between accuracy and robustness. This paper proposes a retraining-free robust reverse sampling framework that, for the first time, simultaneously enhances both accuracy and robustness without modifying the pre-trained DM. Our key contributions are: (1) an adversarially guided robust reverse process preserving semantic consistency; (2) a fine-tuning-free purification architecture enabling adaptive defense against unseen attacks; and (3) a gradient constraint mechanism based on the noise predictor to suppress perturbation propagation. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate state-of-the-art performance: both standard accuracy and robust accuracy improve significantly and concurrently—outperforming all prior methods.
📝 Abstract
Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different from the original input image after the reverse process, leading to degraded standard accuracy. To overcome these issues, a natural idea is to harness adversarial training strategy to retrain or fine-tune the pre-trained diffusion model, which is computationally prohibitive. We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs for the first time, which also provides DM-based AP an efficient adaptive ability to new attacks. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method achieves the state-of-the-art results and exhibits generalization against different attacks.