🤖 AI Summary
This work addresses the insufficient robustness of existing autoregressive image watermarking schemes against removal and forgery attacks, which undermines reliable synthetic content detection and dataset filtering. It is the first to expose the vulnerability of such watermarks under a practical threat model requiring no secret key and only a single reference image. The study introduces three novel attack strategies—vector-quantized regeneration removal, adversarial optimization attack, and frequency-domain injection attack—and proposes the concept of “watermark mimicry,” enabling authentic images to be manipulated so as to falsely trigger detection systems. Experimental results demonstrate that current watermarking mechanisms are highly susceptible to these efficient attacks, achieving high success rates and posing a serious threat to both synthetic image detection and training data curation pipelines.
📝 Abstract
The proliferation of autoregressive (AR) image generators demands reliable detection and attribution of their outputs to mitigate misinformation, and to filter synthetic images from training data to prevent model collapse. To address this need, watermarking techniques, specifically designed for AR models, embed a subtle signal at generation time, enabling downstream verification through a corresponding watermark detector. In this work, we study these schemes and demonstrate their vulnerability to both watermark removal and forgery attacks. We assess existing attacks and further introduce three new attacks: (i) a vector-quantized regeneration removal attack, (ii) adversarial optimization-based attack, and (iii) a frequency injection attack. Our evaluation reveals that removal and forgery attacks can be effective with access to a single watermarked reference image and without access to original model parameters or watermarking secrets. Our findings indicate that existing watermarking schemes for AR image generation do not reliably support synthetic content detection for dataset filtering. Moreover, they enable Watermark Mimicry, whereby authentic images can be manipulated to imitate a generator's watermark and trigger false detection to prevent their inclusion in future model training.