🤖 AI Summary
This work exposes a critical gradient masking issue in the “Ensemble Everything Everywhere” (E4) defense, revealing its claimed robustness to be illusory. We identify— for the first time—that E4’s observed gradient-aware alignment stems not from genuine robustness but from gradient obfuscation. To systematically break E4, we design an adaptive adversarial attack under ℓ∞-norm constraint (ε = 8/255). On CIFAR-10 and CIFAR-100, E4’s robust accuracy collapses from 62% to 11% and from 48% to 14%, respectively, demonstrating its vulnerability even under multi-scale noise ensembling. Our contribution is twofold: (i) a reproducible diagnostic framework for detecting gradient masking in ensemble defenses; and (ii) a strong adaptive attack paradigm that serves as a rigorous evaluation benchmark. This work invalidates E4’s efficacy and provides foundational methodological insights for assessing ensemble-based robustness defenses.
📝 Abstract
Ensemble everything everywhere is a defense to adversarial examples that was recently proposed to make image classifiers robust. This defense works by ensembling a model's intermediate representations at multiple noisy image resolutions, producing a single robust classification. This defense was shown to be effective against multiple state-of-the-art attacks. Perhaps even more convincingly, it was shown that the model's gradients are perceptually aligned: attacks against the model produce noise that perceptually resembles the targeted class. In this short note, we show that this defense is not robust to adversarial attack. We first show that the defense's randomness and ensembling method cause severe gradient masking. We then use standard adaptive attack techniques to reduce the defense's robust accuracy from 48% to 14% on CIFAR-100 and from 62% to 11% on CIFAR-10, under the $ell_infty$-norm threat model with $varepsilon=8/255$.