Diffusion Denoising as a Certified Defense against Clean-label Poisoning

📅 2024-03-18

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses clean-label poisoning attacks—where only 1% of training samples contain imperceptible adversarial perturbations inducing targeted misclassification—by proposing the first provably robust defense leveraging pretrained diffusion models. Our method requires no model retraining: it purifies poisoned samples via diffusion-based denoising and integrates denoising smoothing with p-norm-bounded perturbation modeling to establish certified robustness guarantees. The key contribution is the first incorporation of off-the-shelf diffusion models into a provable defense framework for clean-label poisoning. Evaluated against seven state-of-the-art attacks, our approach reduces attack success rates to 0–16%, while incurring only marginal degradation in test accuracy. It thus achieves a favorable trade-off between certified robustness and model utility, substantially outperforming existing defenses.

Technology Category

Application Category

📝 Abstract

We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.

Problem

Research questions and friction points this paper is trying to address.

Defends against clean-label poisoning attacks using diffusion denoising

Reduces attack success to 0-16% with minimal accuracy drop

Provides certified robustness against adversarial training data tampering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion denoising for data sanitization

Certified defense against clean-label poisoning

Maintains high test accuracy post-defense

🔎 Similar Papers

ADBM: Adversarial diffusion bridge model for reliable adversarial purification