🤖 AI Summary
This study addresses a critical gap in existing AI-based image watermark removal methods, which, while effective at evading watermark detectors, fail to achieve “forensic stealth”—the property that the watermarked content, once altered, remains undetectable by independent forensic detectors. For the first time, this work establishes forensic stealth as a core evaluation criterion for watermark removal and systematically evaluates six state-of-the-art approaches. Through comprehensive experiments involving multiple attack families, spectral analysis, and independent forensic detectors (operating at a 1% false positive rate), the study reveals a fundamental trade-off among watermark evasion, image fidelity, and forensic stealth. Results show that over 98% of successfully modified images remain detectable, and spectral analysis uncovers distinctive bimodal distortions introduced during removal, elucidating the underlying cause of this inherent tension.
📝 Abstract
Watermarks for AI-generated images are meant to support downstream decisions about provenance, manipulation, and trust. In the settings that motivate watermark removal, therefore, success means more than causing the watermark test to fail. A successful remover must also preserve the utility of the image and make the output forensically indistinguishable from clean content, so that defeating the verifier restores deniability rather than merely replacing one detection signal with another. We show that current watermark removal attacks fail this stronger objective. Across six state-of-the-art removers spanning four attack families, independent forensic detectors distinguish removal-processed outputs from clean images at over 98% true-positive rate under a 1% false-positive budget. Thus, current removers often replace the watermark with a different detectable signal. Using UnMarker (IEEE S&P 2025) as a detailed case study, we show that this signal persists under common post-processing, exhibits a characteristic two-regime spectral deformation, and yields a three-way tension among removal success, image quality, and forensic stealth. These results show that existing removal benchmarks are incomplete: they reward verifier evasion and utility preservation while omitting forensic stealth. A workable watermark remover must satisfy all three conditions at once: watermark evasion, utility preservation, and forensic indistinguishability from clean content.