🤖 AI Summary
Existing AI-generated image (AIGC) detectors exhibit security vulnerabilities under black-box API settings, where attackers can only access binary classification outputs via limited queries—without knowledge of model architecture or training data distribution.
Method: This paper introduces the first decision-based adversarial attack operating in the frequency domain. We propose a Discrete Cosine Transform (DCT) spectral partitioning strategy and an “adversarial soup” initialization mechanism to efficiently sample critical subspaces in the frequency domain, coupled with boundary-based feedback optimization to minimize query complexity.
Contribution/Results: Evaluated on Synthetic LSUN and GenImage benchmarks, our method achieves a 92.3% attack success rate within ≤1000 queries while preserving high visual fidelity. It significantly outperforms state-of-the-art black-box attacks in both query efficiency and perceptual quality.
📝 Abstract
The prosperous development of Artificial Intelligence-Generated Content (AIGC) has brought people's anxiety about the spread of false information on social media. Designing detectors for filtering is an effective defense method, but most detectors will be compromised by adversarial samples. Currently, most studies exposing AIGC security issues assume information on model structure and data distribution. In real applications, attackers query and interfere with models that provide services in the form of application programming interfaces (APIs), which constitutes the black-box decision-based attack paradigm. However, to the best of our knowledge, decision-based attacks on AIGC detectors remain unexplored. In this study, we propose extbf{FBA$^2$D}: a frequency-based black-box attack method for AIGC detection to fill the research gap. Motivated by frequency-domain discrepancies between generated and real images, we develop a decision-based attack that leverages the Discrete Cosine Transform (DCT) for fine-grained spectral partitioning and selects frequency bands as query subspaces, improving both query efficiency and image quality. Moreover, attacks on AIGC detectors should mitigate initialization failures, preserve image quality, and operate under strict query budgets. To address these issues, we adopt an ``adversarial example soup'' method, averaging candidates from successive surrogate iterations and using the result as the initialization to accelerate the query-based attack. The empirical study on the Synthetic LSUN dataset and GenImage dataset demonstrate the effectiveness of our prosed method. This study shows the urgency of addressing practical AIGC security problems.