🤖 AI Summary
This work addresses a fundamental vulnerability in existing diffusion model watermarking techniques, which universally rely on reconstructible denoising trajectories for verification. The authors first expose and exploit this dependency by introducing a training-free, black-box attack that randomly resamples the diffusion process in latent space to deviate the generation trajectory. This perturbation statistically decouples the reconstructed image from the original watermark trajectory while preserving visual quality and semantic consistency. Requiring neither prior knowledge of the watermark nor any model fine-tuning, the method achieves 95%–100% watermark removal success across nine state-of-the-art watermarking schemes, demonstrating remarkable generality, efficiency, and practical applicability.
📝 Abstract
Diffusion-based watermarking methods embed verifiable marks by manipulating the initial noise or the reverse diffusion trajectory. However, these methods share a critical assumption: verification can succeed only if the diffusion trajectory can be faithfully reconstructed. This reliance on trajectory recovery constitutes a fundamental and exploitable vulnerability. We propose $\underline{\mathbf{S}}$tochastic $\underline{\mathbf{Hi}}$dden-Trajectory De$\underline{\mathbf{f}}$lec$\underline{\mathbf{t}}$ion ($\mathbf{SHIFT}$), a training-free attack that exploits this common weakness across diverse watermarking paradigms. SHIFT leverages stochastic diffusion resampling to deflect the generative trajectory in latent space, making the reconstructed image statistically decoupled from the original watermark-embedded trajectory while preserving strong visual quality and semantic consistency. Extensive experiments on nine representative watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms show that SHIFT achieves 95%--100% attack success rates with nearly no loss in semantic quality, without requiring any watermark-specific knowledge or model retraining.