Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In ML-as-a-Service (MLaaS) settings, deployed models are vulnerable to model extraction attacks, while existing watermarking techniques suffer from limitations in deployment flexibility, computational overhead, and resilience against adaptive attacks. Method: This paper proposes a training-free, plug-and-play model watermarking framework. It introduces the first information-theoretic formulation of watermark transmission, designs a training-free similarity-based watermarking mechanism, and employs a distributed multi-step watermark propagation strategy—jointly achieving high deployment flexibility and strong robustness. The method requires no model fine-tuning or retraining, incurring zero training overhead. Contribution/Results: Copyright verification is enabled via probability distribution matching and rigorous t-test statistical validation. Under worst-case assumptions, the required number of verification samples drops dramatically—from 12,000 in state-of-the-art methods to only 200—yielding significant improvements in verification efficiency and reliability.

Technology Category

Application Category

📝 Abstract
Developing high-performance deep learning models is resource-intensive, leading model owners to utilize Machine Learning as a Service (MLaaS) platforms instead of publicly releasing their models. However, malicious users may exploit query interfaces to execute model extraction attacks, reconstructing the target model's functionality locally. While prior research has investigated triggerable watermarking techniques for asserting ownership, existing methods face significant challenges: (1) most approaches require additional training, resulting in high overhead and limited flexibility, and (2) they often fail to account for advanced attackers, leaving them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a robust plug-and-play watermarking framework against model extraction attacks. We first formulate a watermark transmission model from an information-theoretic perspective, providing an interpretable account of the principles and limitations of existing triggerable watermarking. Guided by the model, we further introduce: (1) a similarity-based training-free watermarking method for plug-and-play and flexible watermarking, and (2) a distribution-based multi-step watermark information transmission strategy for robust watermarking. Comprehensive experiments on four datasets demonstrate that Neural Honeytrace outperforms previous methods in efficiency and resisting adaptive attacks. Neural Honeytrace reduces the average number of samples required for a worst-case t-Test-based copyright claim from $12,000$ to $200$ with zero training cost.
Problem

Research questions and friction points this paper is trying to address.

AI Model Theft
Watermarking Techniques
Advanced Attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Honeytrace
similarity-based watermarking
multi-step watermark distribution
🔎 Similar Papers
No similar papers found.
Yixiao Xu
Yixiao Xu
Beijing University of Posts and Telecommunications
AI Securityadversarial examplebackdoor attack
B
Binxing Fang
Cyberspace Institute of Advanced Technology, Guangzhou University; Huangpu Research School of Guangzhou University
R
Rui Wang
Cyberspace Institute of Advanced Technology, Guangzhou University; Huangpu Research School of Guangzhou University
Y
Yinghai Zhou
Cyberspace Institute of Advanced Technology, Guangzhou University; Huangpu Research School of Guangzhou University
Shouling Ji
Shouling Ji
Professor, Zhejiang University & Georgia Institute of Technology
Data-driven SecurityAI SecuritySoftware ScurityPrivacy
Y
Yuan Liu
Cyberspace Institute of Advanced Technology, Guangzhou University; Huangpu Research School of Guangzhou University
M
Mohan Li
Cyberspace Institute of Advanced Technology, Guangzhou University; Huangpu Research School of Guangzhou University
Z
Zhihong Tian
Cyberspace Institute of Advanced Technology, Guangzhou University; Huangpu Research School of Guangzhou University