🤖 AI Summary
Existing deep image watermarking methods struggle to simultaneously achieve imperceptibility, robustness, and low latency. To address this, we propose HiWL, a hierarchical two-stage watermarking framework. First, it constructs a shared latent space via distribution alignment learning and introduces a disentanglement mechanism to separate watermark features from image content. Second, it jointly optimizes watermark embedding in both the latent and RGB spaces, integrating information-invariance constraints with generalized watermark representation learning. This design enables, for the first time, cross-space joint training—significantly improving generalization and stability. Experiments demonstrate that HiWL achieves a 7.6% gain in watermark extraction accuracy, processes 100,000 images in just 8 seconds, and substantially enhances robustness against common attacks while maintaining high imperceptibility—thus delivering both superior performance and real-time capability.
📝 Abstract
Deep image watermarking, which refers to enable imperceptible watermark embedding and reliable extraction in cover images, has shown to be effective for copyright protection of image assets. However, existing methods face limitations in simultaneously satisfying three essential criteria for generalizable watermarking: 1) invisibility (imperceptible hide of watermarks), 2) robustness (reliable watermark recovery under diverse conditions), and 3) broad applicability (low latency in watermarking process). To address these limitations, we propose a Hierarchical Watermark Learning (HiWL), a two-stage optimization that enable a watermarking model to simultaneously achieve three criteria. In the first stage, distribution alignment learning is designed to establish a common latent space with two constraints: 1) visual consistency between watermarked and non-watermarked images, and 2) information invariance across watermark latent representations. In this way, multi-modal inputs including watermark message (binary codes) and cover images (RGB pixels) can be well represented, ensuring the invisibility of watermarks and robustness in watermarking process thereby. The second stage employs generalized watermark representation learning to establish a disentanglement policy for separating watermarks from image content in RGB space. In particular, it strongly penalizes substantial fluctuations in separated RGB watermarks corresponding to identical messages. Consequently, HiWL effectively learns generalizable latent-space watermark representations while maintaining broad applicability. Extensive experiments demonstrate the effectiveness of proposed method. In particular, it achieves 7.6% higher accuracy in watermark extraction than existing methods, while maintaining extremely low latency (100K images processed in 8s).