First-Place Solution to NeurIPS 2024 Invisible Watermark Removal Challenge

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This paper addresses the poorly understood robustness of digital media invisible watermarks under adversarial attacks by proposing an efficient watermark erasure method tailored for black-box and gray-box scenarios. The method innovatively integrates test-time optimization with contrast restoration in the CIELAB color space, leverages ChatGPT-generated semantic priors to guide an image-to-image diffusion model, and incorporates variational autoencoders for attack evasion, controllable noise injection, and joint spatial-frequency domain feature modeling. Experimental results demonstrate that the approach achieves a 95.7% watermark removal rate while preserving high visual fidelity (PSNR > 38 dB), significantly outperforming state-of-the-art methods. To the best of our knowledge, this is the first work to realize high-fidelity, high-success-rate watermark removal guided by semantic priors.

Technology Category

Application Category

📝 Abstract

Content watermarking is an important tool for the authentication and copyright protection of digital media. However, it is unclear whether existing watermarks are robust against adversarial attacks. We present the winning solution to the NeurIPS 2024 Erasing the Invisible challenge, which stress-tests watermark robustness under varying degrees of adversary knowledge. The challenge consisted of two tracks: a black-box and beige-box track, depending on whether the adversary knows which watermarking method was used by the provider. For the beige-box track, we leverage an adaptive VAE-based evasion attack, with a test-time optimization and color-contrast restoration in CIELAB space to preserve the image's quality. For the black-box track, we first cluster images based on their artifacts in the spatial or frequency-domain. Then, we apply image-to-image diffusion models with controlled noise injection and semantic priors from ChatGPT-generated captions to each cluster with optimized parameter settings. Empirical evaluations demonstrate that our method successfully achieves near-perfect watermark removal (95.7%) with negligible impact on the residual image's quality. We hope that our attacks inspire the development of more robust image watermarking methods.

Problem

Research questions and friction points this paper is trying to address.

Removing invisible watermarks from digital media content

Testing watermark robustness against adversarial attacks

Achieving near-perfect removal while preserving image quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

VAE-based evasion attack with optimization

Clustering images by spatial or frequency artifacts

Diffusion models with noise injection and captions

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?