๐ค AI Summary
This work addresses the vulnerability of decoder-based, model-free black-box watermarking schemes to query-based removal attacks, which exploit gradient leakage through backpropagation. To mitigate this, the authors propose Decoder Gradient Shielding (DGS), a mechanism that systematically protects against such leakage by redirecting and scaling gradients in the watermark channel at the decoderโs output (DGS-O), input (DGS-I), and intermediate layers (DGS-L). DGS provides the first closed-form, theoretically provable defense against gradient-based removal attacks, effectively blocking the attackerโs ability to train a remover via backpropagation. Experimental results on image deraining and generation tasks demonstrate that DGS achieves 100% defense success against state-of-the-art black-box watermarking methods while preserving the visual quality of the output images.
๐ Abstract
Box-free model watermarking has gained significant attention in deep neural network (DNN) intellectual property protection due to its model-agnostic nature and its ability to flexibly manage high-entropy image outputs from generative models. Typically operating in a black-box manner, it employs an encoder-decoder framework for watermark embedding and extraction. While existing research has focused primarily on the encoders for the robustness to resist various attacks, the decoders have been largely overlooked, leading to attacks against the watermark. In this paper, we identify one such attack against the decoder, where query responses are utilized to obtain backpropagated gradients to train a watermark remover. To address this issue, we propose Decoder Gradient Shields (DGSs), a family of defense mechanisms, including DGS at the output (DGS-O), at the input (DGS-I), and in the layers (DGS-L) of the decoder, with a closed-form solution for DGS-O and provable performance for all DGS. Leveraging the joint design of reorienting and rescaling of the gradients from watermark channel gradient leaking queries, the proposed DGSs effectively prevent the watermark remover from achieving training convergence to the desired low-loss value, while preserving image quality of the decoder output. We demonstrate the effectiveness of our proposed DGSs in diverse application scenarios. Our experimental results on deraining and image generation tasks with the state-of-the-art box-free watermarking show that our DGSs achieve a defense success rate of 100% under all settings.