🤖 AI Summary
To address the vulnerability of AI-generated content watermarks to adversarial attacks—hindering reliable provenance tracing and regulatory oversight—this paper introduces Warfare, the first unified watermark attack framework enabling simultaneous watermark erasure and forgery. We further propose Warfare-Plus, which maintains a high attack success rate (>92%) while significantly improving computational efficiency. Methodologically, Warfare integrates pretrained diffusion models to preserve content fidelity and employs GANs for fine-grained watermark manipulation. Extensive evaluations across multiple mainstream datasets and watermarking schemes demonstrate state-of-the-art attack success rates, alongside exceptional visual quality (37% reduction in FID) and semantic consistency (CLIP Score deviation <1.2%). The implementation is publicly available.
📝 Abstract
AI-Generated Content (AIGC) is rapidly expanding, with services using advanced generative models to create realistic images and fluent text. Regulating such content is crucial to prevent policy violations, such as unauthorized commercialization or unsafe content distribution. Watermarking is a promising solution for content attribution and verification, but we demonstrate its vulnerability to two key attacks: (1) Watermark removal, where adversaries erase embedded marks to evade regulation, and (2) Watermark forging, where they generate illicit content with forged watermarks, leading to misattribution. We propose Warfare, a unified attack framework leveraging a pre-trained diffusion model for content processing and a generative adversarial network for watermark manipulation. Evaluations across datasets and embedding setups show that Warfare achieves high success rates while preserving content quality. We further introduce Warfare-Plus, which enhances efficiency without compromising effectiveness. The code can be found in https://github.com/GuanlinLee/warfare.