Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes a critical security vulnerability in “box-free watermarking” for deep generative networks (GNet): although watermark embedding is encapsulated within a black-box operation network (ONet) by a hidden network (HNet), and only watermarked outputs are released, attackers can accurately reconstruct the original, unwatermarked GNet outputs via query-based reverse engineering. To address this, we propose the first query-driven forward proxy modeling approach, leveraging ONet’s equivalent additive property to enable differentiable reverse estimation. Extensive multi-task experiments demonstrate 100% watermark removal success rate and a peak PSNR of 34.69 dB—substantially outperforming existing attacks. This study provides the first systematic evidence that box-free watermarking is structurally insecure under black-box generative model settings.

Technology Category

Application Category

📝 Abstract
The intellectual property of deep generative networks (GNets) can be protected using a cascaded hiding network (HNet) which embeds watermarks (or marks) into GNet outputs, known as box-free watermarking. Although both GNet and HNet are encapsulated in a black box (called operation network, or ONet), with only the generated and marked outputs from HNet being released to end users and deemed secure, in this paper, we reveal an overlooked vulnerability in such systems. Specifically, we show that the hidden GNet outputs can still be reliably estimated via query-based reverse engineering, leaking the generated and unmarked images, despite the attacker's limited knowledge of the system. Our first attempt is to reverse-engineer an inverse model for HNet under the stringent black-box condition, for which we propose to exploit the query process with specially curated input images. While effective, this method yields unsatisfactory image quality. To improve this, we subsequently propose an alternative method leveraging the equivalent additive property of box-free model watermarking and reverse-engineering a forward surrogate model of HNet, with better image quality preservation. Extensive experimental results on image processing and image generation tasks demonstrate that both attacks achieve impressive watermark removal success rates (100%) while also maintaining excellent image quality (reaching the highest PSNR of 34.69 dB), substantially outperforming existing attacks, highlighting the urgent need for robust defensive strategies to mitigate the identified vulnerability in box-free model watermarking.
Problem

Research questions and friction points this paper is trying to address.

Reverse-engineering black-box watermarked generative model outputs
Leaking unmarked images via query-based vulnerability exploitation
Improving watermark removal success while preserving image quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-based reverse engineering for watermark removal
Reverse-engineer inverse model with curated inputs
Leverage additive property for forward surrogate model
🔎 Similar Papers
No similar papers found.
H
Haonan An
Department of Computer Science, City University of Hong Kong, Hong Kong
G
Guang Hua
Infocomm Technology and Engineering Cluster, Singapore Institute of Technology, Singapore 828608
Hangcheng Cao
Hangcheng Cao
City University of Hong Kong
Internet of Things & Security
Z
Zhengru Fang
Department of Computer Science, City University of Hong Kong, Hong Kong
Guowen Xu
Guowen Xu
Professor, SMIEEE, University of Electronic Science and Technology of China
Applied CryptographyComputer SecurityAI Security and Privacy
S
Susanto Rahardja
Infocomm Technology and Engineering Cluster, Singapore Institute of Technology, Singapore 828608
Y
Yuguang Fang
Department of Computer Science, City University of Hong Kong, Hong Kong