π€ AI Summary
This work addresses the challenge of removing imperceptible watermarks from AI-generated images without access to the internal mechanisms of the watermarking algorithm. The authors propose a black-box attack method that enhances watermark noise in high-frequency regions through edge-aware Gaussian perturbation and integrates a learnable frequency-domain decomposition with a frequency-aware fusion module to construct a denoising network. Notably, this approach effectively removes watermarks without requiring prior knowledge of the watermark model. Experimental results demonstrate that the method reduces the bit accuracy of watermarks embedded by HiDDeN and Stable Signature to below 67% while preserving high visual quality, thereby exposing critical security vulnerabilities in current AI-based watermarking schemes.
π Abstract
AI watermarking embeds invisible signals within images to provide provenance information and identify content as AI-generated. In this paper, we introduce MarkSweep, a novel watermark removal attack that effectively erases the embedded watermarks from AI-generated images without degrading visual quality. MarkSweep first amplifies watermark noise in high-frequency regions via edge-aware Gaussian perturbations and injects it into clean images for training a denoising network. This network then integrates two modules, the learnable frequency decomposition module and the frequency-aware fusion module, to suppress amplified noise and eliminate watermark traces. Theoretical analysis and extensive experiments demonstrate that invisible watermarks are highly vulnerable to MarkSweep, which effectively removes embedded watermarks, reducing the bit accuracy of HiDDeN and Stable Signature watermarking schemes to below 67%, while preserving perceptual quality of AI-generated images.