JND-Guided Neural Watermarking with Spatial Transformer Decoding for Screen-Capture Robustness

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of simultaneously achieving high visual quality and robustness in image watermarking under complex distortions introduced by screen capture—such as moiré patterns, color gamut shifts, perspective distortion, and sensor noise—by proposing an end-to-end deep learning framework that jointly optimizes watermark embedding and extraction. The key innovations include a physics-driven moiré pattern generator for realistic simulation of capture-induced degradations, an adaptive embedding strength controller guided by a Just-Noticeable Difference (JND) perceptual model, and a fully automatic watermark localization and decoding mechanism that integrates semantic segmentation with a symmetric noise template. Experimental results demonstrate that the proposed method embeds 127 bits of information while achieving a PSNR of 30.94 dB and an SSIM of 0.94, effectively balancing perceptual fidelity and robustness against screen-recapture attacks.
📝 Abstract
Screen-shooting robust watermarking aims to imperceptibly embed extractable information into host images such that the watermark survives the complex distortion pipeline of screen display and camera recapture. However, achieving high extraction accuracy while maintaining satisfactory visual quality remains an open challenge, primarily because the screen-shooting channel introduces severe and entangled degradations including Moiré patterns, color-gamut shifts, perspective warping, and sensor noise. In this paper, we present an end-to-end deep learning framework that jointly optimizes watermark embedding and extraction for screen-shooting robustness. Our framework incorporates three key innovations: (i) a comprehensive noise simulation layer that faithfully models realistic screen-shooting distortions -- notably including a physically-motivated Moiré pattern generator -- enabling the network to learn robust representations against the full spectrum of capture-channel noise through adversarial training; (ii) a Just Noticeable Distortion (JND) perceptual loss function that adaptively modulates watermark embedding strength by supervising the perceptual discrepancy between the JND coefficient map and the watermark residual, thereby concentrating watermark energy in perceptually insensitive regions to maximize visual quality; and (iii) two complementary automatic localization modules -- a semantic-segmentation-based foreground extractor for captured image rectification and a symmetric noise template mechanism for anti-cropping region recovery -- that enable fully automated watermark decoding under realistic deployment conditions. Extensive experiments demonstrate that our method achieves an average PSNR of 30.94~dB and SSIM of 0.94 on watermarked images while embedding 127-bit payloads.
Problem

Research questions and friction points this paper is trying to address.

screen-shooting robustness
imperceptible watermarking
Moiré patterns
perceptual quality
watermark extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

JND-guided watermarking
screen-capture robustness
Moiré pattern simulation
spatial transformer decoding
perceptual loss
🔎 Similar Papers
No similar papers found.
J
Jiayi Qin
School of Optical-Electrical and Computer Engineering, Zhejiang Gongshang University
J
Jingwei Li
School of Optical-Electrical and Computer Engineering, Zhejiang Gongshang University
Chuan Wu
Chuan Wu
Professor of Computer Science, The University of Hong Kong
cloud computingdistributed machine learning algorithms and systems