Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

📅 2024-07-30

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the insufficient robustness of AI-generated image (AIGI) detectors against adversarial attacks, presenting the first systematic evaluation of their vulnerability under both white-box and black-box settings. We propose the Frequency-domain–Bayesian Attack (FPBA) framework: it generates highly transferable adversarial perturbations in the frequency domain and incorporates a post-training Bayesian surrogate model to approximate the target detector’s uncertainty distribution—enabling efficient black-box attacks across architectures (CNNs and ViTs), generative models, and defensive mechanisms. Experiments demonstrate that FPBA significantly degrades detection accuracy across diverse AIGI detectors, multiple generative models (e.g., Stable Diffusion, DALL·E), and state-of-the-art defenses (e.g., JPEG compression, feature squeezing). Crucially, FPBA provides the first empirical evidence of cross-generator evasion capability. Our work establishes a new benchmark and methodological foundation for advancing the robustness evaluation and defense of AIGI detection systems.

Technology Category

Application Category

📝 Abstract

Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. To this end, we propose a new method to attack AIGI detectors. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous AIGI detectors, e.g. transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as Frequency-based Post-train Bayesian Attack, or FPBA. Through FPBA, we show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection, which is a crucial real-world detection scenario. The code will be shared upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Assessing adversarial robustness of AI-generated image detectors.

Developing a method to attack detectors using frequency domain perturbations.

Enhancing attack transferability across different models and generators.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency domain perturbations enhance attack effectiveness.

Post-train Bayesian strategy simulates diverse victim models.

FPBA enables cross-model and cross-generator adversarial attacks.

🔎 Similar Papers

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors