Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of unauthorized replication of deep neural network (DNN) models in cloud environments—where owners struggle to verify model ownership without exposing proprietary models—this paper proposes a probabilistic watermarking scheme for the gray-box setting. The method introduces controllable output probability manipulation into DNN watermarking for the first time, leveraging a parameterized iterative variant of the FGSM attack to precisely embed target-class prediction probabilities to predefined values under white-box conditions. Crucially, both watermark embedding and detection are performed without access to the original model, enabling bidirectional verifiability. Evaluated on CIFAR-10 and an ImageNet subset, the approach achieves >99.2% watermark detection accuracy, demonstrates strong robustness against fine-tuning and pruning attacks, and incurs <50 ms verification latency per sample.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a novel framework for ownership verification of deep neural network (DNN) models for image classification tasks. It allows verification of model identity by both the rightful owner and third party without presenting the original model. We assume a gray-box scenario where an unauthorized user owns a model that is illegally copied from the original model, provides services in a cloud environment, and the user throws images and receives the classification results as a probability distribution of output classes. The framework applies a white-box adversarial attack to align the output probability of a specific class to a designated value. Due to the knowledge of original model, it enables the owner to generate such adversarial examples. We propose a simple but effective adversarial attack method based on the iterative Fast Gradient Sign Method (FGSM) by introducing control parameters. Experimental results confirm the effectiveness of the identification of DNN models using adversarial attack.
Problem

Research questions and friction points this paper is trying to address.

Verify DNN ownership using white-box adversarial attacks
Enable model identity verification without original model
Control output probability for specific adversarial examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

White-box adversarial attack for ownership verification
Iterative FGSM with control parameters
Specific probability manipulation technique
🔎 Similar Papers
No similar papers found.
T
Teruki Sano
Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Minoru Kuribayashi
Minoru Kuribayashi
Tohoku University
Multimedia SecurityMultimedia ForensicsInformation SecurityImage Processing
M
Masao Sakai
Center for Data-driven Science and Artificial Intelligence, Tohoku University, Sendai, Japan
S
Shuji Ishobe
Center for Data-driven Science and Artificial Intelligence, Tohoku University, Sendai, Japan
E
Eisuke Koizumi
Center for Data-driven Science and Artificial Intelligence, Tohoku University, Sendai, Japan