An Information Asymmetry Game for Trigger-based DNN Model Watermarking

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Deep neural network (DNN) watermarks are vulnerable to removal via pruning, fine-tuning, and other adversarial attacks, undermining intellectual property (IP) protection. Method: This paper proposes the first trigger-based watermarking framework grounded in information-asymmetric game theory. By formally modeling attacker and defender strategies and associated costs, watermark robustness is recast as an optimal defense problem under Nash equilibrium. The approach integrates private trigger-set design, sparse watermark embedding, and knowledge-hiding mechanisms. Contribution/Results: It establishes, for the first time, an exponential lower bound on watermark detection accuracy under equilibrium. Empirical evaluation shows >98% detection rates across ResNet and ViT models, with robust verification persisting even after aggressive pruning (50% parameters) or fine-tuning—while incurring negligible main-task accuracy degradation (<0.5%). This significantly enhances the trustworthiness and practicality of DNN IP protection.

Technology Category

Application Category

📝 Abstract

As a valuable digital product, deep neural networks (DNNs) face increasingly severe threats to the intellectual property, making it necessary to develop effective technical measures to protect them. Trigger-based watermarking methods achieve copyright protection by embedding triggers into the host DNNs. However, the attacker may remove the watermark by pruning or fine-tuning. We model this interaction as a game under conditions of information asymmetry, namely, the defender embeds a secret watermark with private knowledge, while the attacker can only access the watermarked model and seek removal. We define strategies, costs, and utilities for both players, derive the attacker's optimal pruning budget, and establish an exponential lower bound on the accuracy of watermark detection after attack. Experimental results demonstrate the feasibility of the watermarked model, and indicate that sparse watermarking can resist removal with negligible accuracy loss. This study highlights the effectiveness of game-theoretic analysis in guiding the design of robust watermarking schemes for model copyright protection.

Problem

Research questions and friction points this paper is trying to address.

Modeling DNN watermarking as an information asymmetry game

Deriving optimal attack strategies and detection bounds

Designing sparse watermarks resistant to removal attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Game-theoretic modeling for DNN watermarking

Sparse watermarking resists removal attacks

Exponential lower bound for detection accuracy

🔎 Similar Papers

No similar papers found.