๐ค AI Summary
This work exposes a critical security vulnerability in current invisible watermarking schemes for generative AI: existing methods are highly susceptible to forgery attacks under white-box (non-black-box) conditions. To address this, we propose DiffForgeโthe first practical watermark forgery framework capable of achieving high-fidelity, seamless invisible watermark injection without black-box assumptions. Its core innovations include: (1) a shallow-diffusion inversion mechanism guided by noise evolution dynamics, enabling adaptive control of watermark embedding depth; and (2) unconditional diffusion model-based watermark distribution estimation, integrated with shallow inversion and adaptive step selection. Experiments demonstrate that DiffForge achieves 96.38% evasion success against open-source watermark detectors and over 97% against leading commercial systems, while preserving imperceptibility and visual quality. These results critically expose the robustness bottleneck of prevailing watermarking paradigms.
๐ Abstract
Invisible watermarking is critical for content provenance and accountability in Generative AI. Although commercial companies have increasingly committed to using watermarks, the robustness of existing watermarking schemes against forgery attacks is understudied. This paper proposes DiffForge, the first watermark forgery framework capable of forging imperceptible watermarks under a no-box setting. We estimate the watermark distribution using an unconditional diffusion model and introduce shallow inversion to inject the watermark into a non-watermarked image seamlessly. This approach facilitates watermark injection while preserving image quality by adaptively selecting the depth of inversion steps, leveraging our key insight that watermarks degrade with added noise during the early diffusion phases. Comprehensive evaluations show that DiffForge deceives open-source watermark detectors with a 96.38% success rate and misleads a commercial watermark system with over 97% success rate, achieving high confidence.1 This work reveals fundamental security limitations in current watermarking paradigms.