Robustifying Diffusion-Denoised Smoothing Against Covariate Shift

📅 2025-09-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

In randomized smoothing, employing pre-trained diffusion denoising models introduces covariate shift due to biased noise estimation, degrading certified robustness. This shift arises because the standard denoising objective is misaligned with the actual noise distribution induced by the smoothing process. To address this, we propose an adversarial training objective tailored to the diffusion process: adversarial perturbations are injected during the noise addition stage, explicitly adapting the base classifier to the true smoothed noise distribution. Crucially, our method requires no architectural modifications to the denoising model nor retraining of the diffusion model—covariate shift is mitigated at its source solely by reformulating the training objective. Evaluated on MNIST, CIFAR-10, and ImageNet under ℓ₂ perturbations, our approach achieves state-of-the-art certified accuracy, significantly outperforming existing randomized smoothing and diffusion-augmented robustness methods.

Technology Category

Application Category

📝 Abstract

Randomized smoothing is a well-established method for achieving certified robustness against l2-adversarial perturbations. By incorporating a denoiser before the base classifier, pretrained classifiers can be seamlessly integrated into randomized smoothing without significant performance degradation. Among existing methods, Diffusion Denoised Smoothing - where a pretrained denoising diffusion model serves as the denoiser - has produced state-of-the-art results. However, we show that employing a denoising diffusion model introduces a covariate shift via misestimation of the added noise, ultimately degrading the smoothed classifier's performance. To address this issue, we propose a novel adversarial objective function focused on the added noise of the denoising diffusion model. This approach is inspired by our understanding of the origin of the covariate shift. Our goal is to train the base classifier to ensure it is robust against the covariate shift introduced by the denoiser. Our method significantly improves certified accuracy across three standard classification benchmarks - MNIST, CIFAR-10, and ImageNet - achieving new state-of-the-art performance in l2-adversarial perturbations. Our implementation is publicly available at https://github.com/ahedayat/Robustifying-DDS-Against-Covariate-Shift

Problem

Research questions and friction points this paper is trying to address.

Addressing covariate shift in diffusion-denoised smoothing methods

Improving robustness against adversarial l2 perturbations in classifiers

Enhancing certified accuracy across MNIST, CIFAR-10 and ImageNet benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial objective function for noise

Training base classifier against covariate shift

Improving certified accuracy across benchmarks

🔎 Similar Papers

Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching