🤖 AI Summary
This work addresses the insufficient robustness of AI-generated image (AIGI) detectors against adversarial attacks, presenting the first systematic evaluation of their vulnerability under both white-box and black-box settings. We propose the Frequency-domain–Bayesian Attack (FPBA) framework: it generates highly transferable adversarial perturbations in the frequency domain and incorporates a post-training Bayesian surrogate model to approximate the target detector’s uncertainty distribution—enabling efficient black-box attacks across architectures (CNNs and ViTs), generative models, and defensive mechanisms. Experiments demonstrate that FPBA significantly degrades detection accuracy across diverse AIGI detectors, multiple generative models (e.g., Stable Diffusion, DALL·E), and state-of-the-art defenses (e.g., JPEG compression, feature squeezing). Crucially, FPBA provides the first empirical evidence of cross-generator evasion capability. Our work establishes a new benchmark and methodological foundation for advancing the robustness evaluation and defense of AIGI detection systems.
📝 Abstract
Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. To this end, we propose a new method to attack AIGI detectors. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous AIGI detectors, e.g. transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as Frequency-based Post-train Bayesian Attack, or FPBA. Through FPBA, we show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection, which is a crucial real-world detection scenario. The code will be shared upon acceptance.