🤖 AI Summary
To address the vulnerability of existing model watermarking schemes to removal and ineffective provenance tracing under model stealing attacks, this paper proposes DeepTracer—a novel framework that deeply couples the watermarking task with the primary learning task, thereby forcing adversaries to inherit the watermark during functional model replication. Methodologically, DeepTracer integrates a multi-task learning framework with deep neural networks and a customized watermark embedding strategy. It introduces two key innovations: (i) a same-class coupling loss that enforces semantic consistency between watermark and main-task predictions, and (ii) a dynamic watermark sample filtering mechanism that enhances intrinsic watermark–model binding. Extensive experiments across diverse datasets and model architectures demonstrate that DeepTracer preserves primary task performance while significantly outperforming state-of-the-art watermarking methods. It achieves superior robustness against various model stealing and watermark removal attacks, maintaining high detection accuracy and establishing new SOTA defense performance.
📝 Abstract
Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.