Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural network watermarking (NNW) suffers from poor robustness against forgery and overwrite attacks, particularly for weight-based methods vulnerable to parameter tampering. This paper proposes NeuralMark, the first framework to introduce a hash-based watermark filtering mechanism: an irreversible binary hash watermark acts as a parameter selector, tightly coupling the watermark with model weights; average pooling is further integrated to enhance resilience against fine-tuning and pruning. NeuralMark is architecture-agnostic—compatible with both CNNs and Transformers—and supports diverse tasks, including image classification and text generation. Evaluated across 13 mainstream models, NeuralMark achieves significant improvements in robustness against forgery, overwrite, and compression attacks, while incurring minimal accuracy degradation (<1.2%). A formal security analysis is provided, establishing theoretical guarantees for watermark integrity and unforgeability.

Technology Category

Application Category

📝 Abstract
As valuable digital assets, deep neural networks necessitate robust ownership protection, positioning neural network watermarking (NNW) as a promising solution. Among various NNW approaches, weight-based methods are favored for their simplicity and practicality; however, they remain vulnerable to forging and overwriting attacks. To address those challenges, we propose NeuralMark, a robust method built around a hashed watermark filter. Specifically, we utilize a hash function to generate an irreversible binary watermark from a secret key, which is then used as a filter to select the model parameters for embedding. This design cleverly intertwines the embedding parameters with the hashed watermark, providing a robust defense against both forging and overwriting attacks. An average pooling is also incorporated to resist fine-tuning and pruning attacks. Furthermore, it can be seamlessly integrated into various neural network architectures, ensuring broad applicability. Theoretically, we analyze its security boundary. Empirically, we verify its effectiveness and robustness across 13 distinct Convolutional and Transformer architectures, covering five image classification tasks and one text generation task. The source codes are available at https://github.com/AIResearch-Group/NeuralMark.
Problem

Research questions and friction points this paper is trying to address.

Defends against forging and overwriting attacks in neural networks
Enhances robustness of weight-based watermarking methods
Ensures broad applicability across diverse neural architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hashed watermark filter for robust defense
Irreversible binary watermark from secret key
Average pooling resists fine-tuning and pruning
🔎 Similar Papers
No similar papers found.