Learning Generalizable and Efficient Image Watermarking via Hierarchical Two-Stage Optimization

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing deep image watermarking methods struggle to simultaneously achieve imperceptibility, robustness, and low latency. To address this, we propose HiWL, a hierarchical two-stage watermarking framework. First, it constructs a shared latent space via distribution alignment learning and introduces a disentanglement mechanism to separate watermark features from image content. Second, it jointly optimizes watermark embedding in both the latent and RGB spaces, integrating information-invariance constraints with generalized watermark representation learning. This design enables, for the first time, cross-space joint training—significantly improving generalization and stability. Experiments demonstrate that HiWL achieves a 7.6% gain in watermark extraction accuracy, processes 100,000 images in just 8 seconds, and substantially enhances robustness against common attacks while maintaining high imperceptibility—thus delivering both superior performance and real-time capability.

Technology Category

Application Category

📝 Abstract

Deep image watermarking, which refers to enable imperceptible watermark embedding and reliable extraction in cover images, has shown to be effective for copyright protection of image assets. However, existing methods face limitations in simultaneously satisfying three essential criteria for generalizable watermarking: 1) invisibility (imperceptible hide of watermarks), 2) robustness (reliable watermark recovery under diverse conditions), and 3) broad applicability (low latency in watermarking process). To address these limitations, we propose a Hierarchical Watermark Learning (HiWL), a two-stage optimization that enable a watermarking model to simultaneously achieve three criteria. In the first stage, distribution alignment learning is designed to establish a common latent space with two constraints: 1) visual consistency between watermarked and non-watermarked images, and 2) information invariance across watermark latent representations. In this way, multi-modal inputs including watermark message (binary codes) and cover images (RGB pixels) can be well represented, ensuring the invisibility of watermarks and robustness in watermarking process thereby. The second stage employs generalized watermark representation learning to establish a disentanglement policy for separating watermarks from image content in RGB space. In particular, it strongly penalizes substantial fluctuations in separated RGB watermarks corresponding to identical messages. Consequently, HiWL effectively learns generalizable latent-space watermark representations while maintaining broad applicability. Extensive experiments demonstrate the effectiveness of proposed method. In particular, it achieves 7.6% higher accuracy in watermark extraction than existing methods, while maintaining extremely low latency (100K images processed in 8s).

Problem

Research questions and friction points this paper is trying to address.

Achieve invisibility, robustness, and broad applicability in watermarking

Optimize watermark embedding and extraction for diverse conditions

Reduce latency while maintaining high accuracy in watermark recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical two-stage optimization for watermarking

Distribution alignment learning for latent space

Generalized watermark representation in RGB

🔎 Similar Papers

LaWa: Using Latent Space for In-Generation Image Watermarking