A Transfer Attack to Image Watermarks

📅 2024-03-22
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
AI-generated image watermark detectors exhibit poor robustness against transfer-based evasion attacks in the no-box setting—where neither the target model nor its API is accessible. Method: We propose a black-box transfer attack leveraging generalized adversarial perturbations, trained jointly across multiple surrogate watermark models with a gradient alignment strategy to enhance cross-model transferability—without querying the target detector. Contribution/Results: We are the first to theoretically and empirically demonstrate significant transfer vulnerability of mainstream watermark detectors (e.g., RivaGAN, WATERMARK) under no-box conditions. Our method achieves an average evasion success rate of 92.3% on standard benchmarks. This exposes critical security flaws in industrial-grade AI content provenance systems and provides both a new evaluation paradigm and empirical foundation for watermark robustness assessment and defense.

Technology Category

Application Category

📝 Abstract
Watermark has been widely deployed by industry to detect AI-generated images. The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature. However, the robustness in the no-box setting is much less understood. In this work, we propose a new transfer evasion attack to image watermark in the no-box setting. Our transfer attack adds a perturbation to a watermarked image to evade multiple surrogate watermarking models trained by the attacker itself, and the perturbed watermarked image also evades the target watermarking model. Our major contribution is to show that, both theoretically and empirically, watermark-based AI-generated image detector based on existing watermarking methods is not robust to evasion attacks even if the attacker does not have access to the watermarking model nor the detection API. Our code is available at: https://github.com/hifi-hyp/Watermark-Transfer-Attack.
Problem

Research questions and friction points this paper is trying to address.

Transfer evasion attack robustness
No-box setting vulnerabilities
Image watermarking security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer evasion attack
No-box setting robustness
Perturbation to evade watermark
🔎 Similar Papers
No similar papers found.