An Information Asymmetry Game for Trigger-based DNN Model Watermarking

πŸ“… 2025-10-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Deep neural network (DNN) watermarks are vulnerable to removal via pruning, fine-tuning, and other adversarial attacks, undermining intellectual property (IP) protection. Method: This paper proposes the first trigger-based watermarking framework grounded in information-asymmetric game theory. By formally modeling attacker and defender strategies and associated costs, watermark robustness is recast as an optimal defense problem under Nash equilibrium. The approach integrates private trigger-set design, sparse watermark embedding, and knowledge-hiding mechanisms. Contribution/Results: It establishes, for the first time, an exponential lower bound on watermark detection accuracy under equilibrium. Empirical evaluation shows >98% detection rates across ResNet and ViT models, with robust verification persisting even after aggressive pruning (50% parameters) or fine-tuningβ€”while incurring negligible main-task accuracy degradation (<0.5%). This significantly enhances the trustworthiness and practicality of DNN IP protection.

Technology Category

Application Category

πŸ“ Abstract
As a valuable digital product, deep neural networks (DNNs) face increasingly severe threats to the intellectual property, making it necessary to develop effective technical measures to protect them. Trigger-based watermarking methods achieve copyright protection by embedding triggers into the host DNNs. However, the attacker may remove the watermark by pruning or fine-tuning. We model this interaction as a game under conditions of information asymmetry, namely, the defender embeds a secret watermark with private knowledge, while the attacker can only access the watermarked model and seek removal. We define strategies, costs, and utilities for both players, derive the attacker's optimal pruning budget, and establish an exponential lower bound on the accuracy of watermark detection after attack. Experimental results demonstrate the feasibility of the watermarked model, and indicate that sparse watermarking can resist removal with negligible accuracy loss. This study highlights the effectiveness of game-theoretic analysis in guiding the design of robust watermarking schemes for model copyright protection.
Problem

Research questions and friction points this paper is trying to address.

Modeling DNN watermarking as an information asymmetry game
Deriving optimal attack strategies and detection bounds
Designing sparse watermarks resistant to removal attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Game-theoretic modeling for DNN watermarking
Sparse watermarking resists removal attacks
Exponential lower bound for detection accuracy
πŸ”Ž Similar Papers
No similar papers found.
C
Chaoyue Huang
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
G
Gejian Zhao
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
Hanzhou Wu
Hanzhou Wu
Shanghai University / Guizhou Normal University
AI SecurityMultimedia SecurityMultimedia ForensicsSignal ProcessingLarge Language Models
Zhihua Xia
Zhihua Xia
Jinan University
Digital Forensics
Asad Malik
Asad Malik
School of Information Technology, Monash University Malaysia, Bandar Sunway 47500, Malaysia