🤖 AI Summary
This paper addresses the challenge of simultaneously achieving global robustness, precise local localization, and strong resilience against perturbations in small regions for image watermarking in tampering detection and localized protection. To this end, we propose MaskMark—a novel Encoder-Distortion-Decoder framework featuring a learnable decoder-side masking mechanism for accurate localized watermark extraction, a lightweight watermark localization network for enhanced regional identification accuracy, and mask-aware encoding to improve robustness in small regions. Extensive experiments demonstrate that MaskMark achieves state-of-the-art performance across joint global/local watermark embedding and extraction, multi-watermark multiplexing, and precise watermark localization. It preserves high visual fidelity and incurs only 1/15 the training cost of WAM—requiring merely 20 hours on a single A6000 GPU.
📝 Abstract
We present MaskMark, a simple, efficient and flexible framework for image watermarking. MaskMark has two variants: MaskMark-D, which supports global watermark embedding, watermark localization, and local watermark extraction for applications such as tamper detection, and MaskMark-ED, which focuses on local watermark embedding and extraction with enhanced robustness in small regions, enabling localized image protection. Built upon the classical Encoder- Distortion-Decoder training paradigm, MaskMark-D introduces a simple masking mechanism during the decoding stage to support both global and local watermark extraction. A mask is applied to the watermarked image before extraction, allowing the decoder to focus on selected regions and learn local extraction. A localization module is also integrated into the decoder to identify watermark regions during inference, reducing interference from irrelevant content and improving accuracy. MaskMark-ED extends this design by incorporating the mask into the encoding stage as well, guiding the encoder to embed the watermark in designated local regions for enhanced robustness. Comprehensive experiments show that MaskMark achieves state-of-the-art performance in global watermark extraction, local watermark extraction, watermark localization, and multi-watermark embedding. It outperforms all existing baselines, including the recent leading model WAM for local watermarking, while preserving high visual quality of the watermarked images. MaskMark is also flexible, by adjusting the distortion layer, it can adapt to different robustness requirements with just a few steps of fine-tuning. Moreover, our approach is efficient and easy to optimize, requiring only 20 hours on a single A6000 GPU with just 1/15 the computational cost of WAM.