VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection

πŸ“… 2026-01-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the dual challenge of preserving high audio fidelity while ensuring robust watermarking against misuse of highly natural synthetic speech. To this end, the authors propose VocBulwark, a framework that embeds watermarks deeply into acoustic features by injecting learnable temporal adapters without modifying the frozen weights of the primary speech generation model. The method jointly optimizes watermark embedding and extraction through a coarse-to-fine gated extractor and an accuracy-guided curriculum learning strategy. VocBulwark achieves high-capacity watermarking with minimal impact on perceptual quality and demonstrates strong robustness against diverse signal processing attacks, including codec compression, resampling, and variable-length manipulations.

Technology Category

Application Category

πŸ“ Abstract
Generated speech achieves human-level naturalness but escalates security risks of misuse. However, existing watermarking methods fail to reconcile fidelity with robustness, as they rely either on simple superposition in the noise space or on intrusive alterations to model weights. To bridge this gap, we propose VocBulwark, an additional-parameter injection framework that freezes generative model parameters to preserve perceptual quality. Specifically, we design a Temporal Adapter to deeply entangle watermarks with acoustic attributes, synergizing with a Coarse-to-Fine Gated Extractor to resist advanced attacks. Furthermore, we develop an Accuracy-Guided Optimization Curriculum that dynamically orchestrates gradient flow to resolve the optimization conflict between fidelity and robustness. Comprehensive experiments demonstrate that VocBulwark achieves high-capacity and high-fidelity watermarking, offering robust defense against complex practical scenarios, with resilience to Codec regenerations and variable-length manipulations.
Problem

Research questions and friction points this paper is trying to address.

speech watermarking
fidelity
robustness
generative speech
security
Innovation

Methods, ideas, or system contributions that make the work stand out.

speech watermarking
parameter injection
temporal adapter
robustness-fidelity tradeoff
generative audio security
πŸ”Ž Similar Papers
No similar papers found.
Weizhi Liu
Weizhi Liu
εŽδΈœεΈˆθŒƒε€§ε­¦
AIGC securityGenerative watermarking
Y
Yue Li
Huaqiao University, Xiamen, China
Z
Zhaoxia Yin
East China Normal University, Shanghai, China