Yet Another Watermark for Large Language Models

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM watermarking methods either rely on sampling adjustments or post-processing—degrading semantic quality—or require fine-tuning under white-box assumptions, incurring high computational overhead. This paper proposes the first parameter-level watermarking framework designed specifically for black-box scenarios: it embeds watermarks via lightweight, reversible internal parameter modulation, achieving deep integration with the model’s intrinsic mechanisms. Crucially, it requires no access to model weights or gradients and enables efficient, zero-access watermark extraction. The method preserves textual semantic fidelity while significantly enhancing robustness against common adversarial attacks—including pruning, paraphrasing, and translation. Experiments demonstrate that its watermark detection accuracy substantially surpasses state-of-the-art sampling- and post-processing-based approaches, all without large-scale fine-tuning. Thus, the framework simultaneously ensures strong copyright protection, practical deployability, and high inference efficiency.

Technology Category

Application Category

📝 Abstract
Existing watermarking methods for large language models (LLMs) mainly embed watermark by adjusting the token sampling prediction or post-processing, lacking intrinsic coupling with LLMs, which may significantly reduce the semantic quality of the generated marked texts. Traditional watermarking methods based on training or fine-tuning may be extendable to LLMs. However, most of them are limited to the white-box scenario, or very time-consuming due to the massive parameters of LLMs. In this paper, we present a new watermarking framework for LLMs, where the watermark is embedded into the LLM by manipulating the internal parameters of the LLM, and can be extracted from the generated text without accessing the LLM. Comparing with related methods, the proposed method entangles the watermark with the intrinsic parameters of the LLM, which better balances the robustness and imperceptibility of the watermark. Moreover, the proposed method enables us to extract the watermark under the black-box scenario, which is computationally efficient for use. Experimental results have also verified the feasibility, superiority and practicality. This work provides a new perspective different from mainstream works, which may shed light on future research.
Problem

Research questions and friction points this paper is trying to address.

Develops a watermarking framework for LLMs
Embeds watermarks via internal parameter manipulation
Enables black-box extraction without model access
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding watermark via internal parameter manipulation
Black-box extraction without LLM access
Balancing robustness and imperceptibility efficiently
🔎 Similar Papers
2024-06-17North American Chapter of the Association for Computational LinguisticsCitations: 2
S
Siyuan Bao
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
Ying Shi
Ying Shi
Syracuse University
Education PolicyRacial InequalityLabor Economics
Z
Zhiguang Yang
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
Hanzhou Wu
Hanzhou Wu
Shanghai University / Guizhou Normal University
AI SecurityMultimedia SecurityMultimedia ForensicsSignal ProcessingLarge Language Models
X
Xinpeng Zhang
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China