Pixel Seal: Adversarial-only training for invisible image and video watermarking

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing invisible watermarking methods face three critical bottlenecks: (i) proxy perceptual losses (e.g., MSE, LPIPS) exhibit significant misalignment with human visual perception, introducing visible artifacts; (ii) conflicting multi-objective optimization leads to training instability and heavy reliance on manual hyperparameter tuning; (iii) robustness and imperceptibility degrade markedly on high-resolution images and videos. This paper proposes the first purely adversarial training paradigm for invisible watermarking, featuring a three-stage decoupled optimization strategy, a JND-guided high-resolution adaptive mechanism, and runtime up-sampling simulation. Key technical innovations include JND-aware perceptual modeling, temporal watermark pooling, and multi-stage scheduling. Experiments demonstrate superior robustness against diverse attacks over state-of-the-art methods; both objective metrics and subjective evaluations confirm complete imperceptibility; and the method scales efficiently to HD video. The work bridges theoretical innovation with practical applicability.

Technology Category

Application Category

📝 Abstract
Invisible watermarking is essential for tracing the provenance of digital content. However, training state-of-the-art models remains notoriously difficult, with current approaches often struggling to balance robustness against true imperceptibility. This work introduces Pixel Seal, which sets a new state-of-the-art for image and video watermarking. We first identify three fundamental issues of existing methods: (i) the reliance on proxy perceptual losses such as MSE and LPIPS that fail to mimic human perception and result in visible watermark artifacts; (ii) the optimization instability caused by conflicting objectives, which necessitates exhaustive hyperparameter tuning; and (iii) reduced robustness and imperceptibility of watermarks when scaling models to high-resolution images and videos. To overcome these issues, we first propose an adversarial-only training paradigm that eliminates unreliable pixel-wise imperceptibility losses. Second, we introduce a three-stage training schedule that stabilizes convergence by decoupling robustness and imperceptibility. Third, we address the resolution gap via high-resolution adaptation, employing JND-based attenuation and training-time inference simulation to eliminate upscaling artifacts. We thoroughly evaluate the robustness and imperceptibility of Pixel Seal on different image types and across a wide range of transformations, and show clear improvements over the state-of-the-art. We finally demonstrate that the model efficiently adapts to video via temporal watermark pooling, positioning Pixel Seal as a practical and scalable solution for reliable provenance in real-world image and video settings.
Problem

Research questions and friction points this paper is trying to address.

Addresses reliance on proxy perceptual losses causing visible artifacts
Solves optimization instability from conflicting robustness and imperceptibility objectives
Overcomes reduced performance when scaling to high-resolution images and videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial-only training eliminates unreliable pixel-wise imperceptibility losses.
Three-stage training schedule decouples robustness and imperceptibility for stable convergence.
High-resolution adaptation with JND-based attenuation removes upscaling artifacts.
🔎 Similar Papers
No similar papers found.