🤖 AI Summary
Existing image watermarking methods are vulnerable to attacks and struggle to effectively address the challenges posed by deepfakes and copyright protection. This work proposes a novel paradigm—“attacking watermarks with watermarks”—which, for the first time, reveals the intrinsic similarity between watermark embedding and attack mechanisms. The authors design a universal watermark removal strategy that requires neither gradients, proxy models, nor detection keys. By integrating re-watermarking with a lightweight classifier, the method efficiently identifies and overwrites target watermarks without access to white-box information. Experiments demonstrate that the approach achieves watermark identification accuracy of 0.878–0.953 across 96 experimental settings, and re-watermarking reduces the bit accuracy of original watermarks by 25% to 48%, substantially enhancing robustness evaluation and defensive capabilities against watermark-based threats.
📝 Abstract
Watermarking combines an imperceptible change to an input image that will trigger a detector, to assert provenance and protect intellectual property. The literature has shown great interest in attacks on watermarking schemes: attackers are clearly motivated to steal copyrighted material or circumvent legislated deepfake protections. In this work, we make a simple-yet-powerful observation: that such attacks on watermarking-like watermarks themselves-seek an imperceptible change to an input image (now already watermarked) that will trigger a detector. This analogy comparing watermark attacks to watermarking itself is highly suggestive: that watermarks could be used to attack watermarks. Our first contribution validates this hypothesis. In rigorous experiments spanning 96 combinations of dataset, victim, and attack watermarks, we show that simply re-watermarking an already watermarked image reliably suppresses the original signal, without requiring gradients, surrogate models, or detection keys. Our second contribution is a simple classifier for detecting the presence and identity of an existing watermark in a given image. Surprisingly, experimental findings demonstrate outstanding overall accuracies 0.878-0.953. This result is of independent interest as a security vulnerability: research shows that method-specific attacks achieve substantially stronger removal than black-box attacks. Taken together, watermark identification combined with re-watermarking successfully reduces bit accuracy by at least 25% and up to 48%. Our work constitutes a cheap, generic, and highly effective attack pipeline, calling into question the reliability of current watermarking schemes to such a simple attack, as well as the value of existing sophisticated attacks.