Assessing Robustness via Score-Based Adversarial Image Generation

📅 2023-10-06

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing ℓₚ-norm-constrained adversarial attacks and defenses fail to capture semantically preserving perturbations, limiting the scope of robustness evaluation. To address this, we propose ScoreAG—a novel framework that pioneers the integration of score-based generative models (via SDE/ODE sampling) into adversarial example generation. ScoreAG transcends ℓₚ-norm constraints by introducing *semantic-bounded adversarialness*, defining the perturbation space via semantic consistency rather than pixel-level norms. The framework supports both image transformation and zero-shot synthesis, and incorporates generative adversarial purification for end-to-end robustness enhancement. Evaluated across multiple benchmarks, ScoreAG achieves state-of-the-art performance in both attack and defense tasks, significantly improving classifier robustness against semantic-level perturbations. Our results empirically validate that semantic constraints constitute a more principled and generalizable paradigm for robustness assessment compared to conventional ℓₚ-norm bounds.

📝 Abstract

Most adversarial attacks and defenses focus on perturbations within small $ell_p$-norm constraints. However, $ell_p$ threat models cannot capture all relevant semantic-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate adversarial examples beyond $ell_p$-norm constraints, so-called unrestricted adversarial examples, overcoming their limitations. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating realistic adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that ScoreAG matches the performance of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than $ell_p$-norm constraints. ScoreAG represents an important step towards more encompassing robustness assessments.

Problem

Research questions and friction points this paper is trying to address.

Generates unrestricted adversarial examples beyond $ell_p$-norm constraints.

Maintains image semantics while creating adversarial examples.

Enhances classifier robustness through image purification.

Innovation

Methods, ideas, or system contributions that make the work stand out.

ScoreAG generates unrestricted adversarial examples.

ScoreAG maintains image semantics during generation.

ScoreAG purifies images to enhance classifier robustness.

🔎 Similar Papers

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors