🤖 AI Summary
Existing autoregressive (AR) image generation models lack native watermarking support, and diffusion-based watermarking methods are neither transferable nor robust against regeneration attacks. To address this, we propose Lexical Bias Watermarking (LBW), a watermarking framework natively compatible with AR architectures. LBW embeds watermarks during token prediction by dynamically steering vocabulary selection via green-list guidance—introducing lexical bias without modifying model weights. It is the first token-level, training-free, and post-deployment watermarking mechanism for AR models. We further enhance white-box security through multi-green-list random sampling and decouple watermark embedding from detection to enable post-hoc verification. Extensive experiments across diverse AR models demonstrate >99.2% watermark detection accuracy, significantly superior robustness against regeneration attacks compared to baselines, and a false positive rate <0.1%.
📝 Abstract
Autoregressive (AR) image generation models have gained increasing attention for their breakthroughs in synthesis quality, highlighting the need for robust watermarking to prevent misuse. However, existing in-generation watermarking techniques are primarily designed for diffusion models, where watermarks are embedded within diffusion latent states. This design poses significant challenges for direct adaptation to AR models, which generate images sequentially through token prediction. Moreover, diffusion-based regeneration attacks can effectively erase such watermarks by perturbing diffusion latent states. To address these challenges, we propose Lexical Bias Watermarking (LBW), a novel framework designed for AR models that resists regeneration attacks. LBW embeds watermarks directly into token maps by biasing token selection toward a predefined green list during generation. This approach ensures seamless integration with existing AR models and extends naturally to post-hoc watermarking. To increase the security against white-box attacks, instead of using a single green list, the green list for each image is randomly sampled from a pool of green lists. Watermark detection is performed via quantization and statistical analysis of the token distribution. Extensive experiments demonstrate that LBW achieves superior watermark robustness, particularly in resisting regeneration attacks.