P2Mark: Plug-and-play Parameter-intrinsic Watermarking for Neural Speech Generation

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

In open-source white-box settings, neural speech synthesis models are vulnerable to watermark removal, hindering reliable copyright attribution. To address this, we propose a parameter-intrinsic watermarking mechanism that embeds watermarks directly into trainable model parameters—not output audio—enabling tight coupling with weights and intrinsic non-removability. Our approach employs differentiable adapters for end-to-end joint optimization of watermark embedding and model functionality. It is compatible with both vocoder- and codec-based decoders and supports cross-architecture deployment. Experiments demonstrate state-of-the-art performance in watermark detection accuracy, perceptual imperceptibility, and robustness against removal attacks. Notably, ours is the first watermarking framework to provide reliable model-level copyright tracing and protection under open-source white-box conditions.

Technology Category

Application Category

📝 Abstract

Recently, a large number of advanced neural speech generation methods have emerged in the open-source community. Although this has facilitated the application and development of technology, it has also increased the difficulty of preventing the abuse of generated speech and protecting copyrights. Audio watermarking technology is an effective method for proactively protecting generated speech, but when the source codes and model weights of the neural speech generation methods are open-sourced, audio watermarks based on previous watermarking methods can be easily removed or manipulated. This paper proposes a Plug-and-play Parameter-intrinsic WaterMarking (P2Mark) method for neural speech generation system protection. The main advantage of P2Mark is that the watermark information is flexibly integrated into the neural speech generation model in the form of parameters by training a watermark adapter rather than injecting the watermark into the model in the form of features. After the watermark adapter with the watermark embedding is merged with the pre-trained generation model, the watermark information cannot be easily removed or manipulated. Therefore, P2Mark will be a reliable choice for proactively tracing and protecting the copyrights of neural speech generation models in open-source white-box scenarios. We validated P2Mark on two main types of decoders in neural speech generation: vocoder and codec. Experimental results show that P2Mark achieves performance comparable to state-of-the-art audio watermarking methods that cannot be used for open-source white-box protection scenarios in terms of watermark extraction accuracy, watermark imperceptibility, and robustness.

Problem

Research questions and friction points this paper is trying to address.

Prevents abuse of open-source neural speech generation models

Protects copyrights via parameter-intrinsic watermarking

Ensures watermark robustness in white-box scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play watermark adapter for neural speech

Parameter-intrinsic watermarking for model protection

Robust watermarking in open-source white-box scenarios

🔎 Similar Papers

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification