Diffusion Models are Certifiably Robust Classifiers

📅 2024-02-04

🏛️ Neural Information Processing Systems

📈 Citations: 10

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Diffusion classifiers exhibit strong empirical robustness but lack theoretical guarantees, limiting their reliability against unseen strong adversarial attacks. This work provides the first theoretical proof that diffusion classifiers satisfy O(1)-Lipschitz continuity, enabling provable ℓ₂-robustness certification. Building on this insight, we propose Noised Diffusion Classifiers (NDCs), a novel framework that leverages Gaussian noise modeling, ELBO derivation, and Bayesian probability computation to achieve tight, retraining-free robustness certification. NDCs constitute the first diffusion-based classification framework offering both formal provability and tight robustness bounds. Evaluated on CIFAR-10, NDCs attain certified accuracies of 80.2% and 70.5% under ℓ₂ perturbation radii of 0.25 and 0.5, respectively—using only a single pre-trained diffusion model—significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract

Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrated superior empirical robustness. However, a comprehensive theoretical understanding of their robustness is still lacking, raising concerns about their vulnerability to stronger future attacks. In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. To achieve non-constant Lipschitzness, thereby obtaining much tighter certified robustness, we generalize diffusion classifiers to classify Gaussian-corrupted data. This involves deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. Experimental results show the superior certified robustness of these Noised Diffusion Classifiers (NDCs). Notably, we achieve over 80% and 70% certified robustness on CIFAR-10 under adversarial perturbations with (ell_2) norms less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.

Problem

Research questions and friction points this paper is trying to address.

Proving theoretical robustness guarantees for diffusion classifiers

Establishing certified adversarial resilience with Lipschitz continuity bounds

Generalizing classifiers to handle Gaussian noise for tighter certifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proving diffusion classifiers' Lipschitzness and robustness

Generalizing to classify Gaussian-corrupted data distributions

Achieving certified robustness via ELBO approximation

🔎 Similar Papers

No similar papers found.