Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM watermarking techniques primarily focus on the pretraining phase, neglecting verification challenges in data preprocessing filtering, post-training unlearning, and black-box API scenarios. This paper proposes a semantic watermarking method designed for end-to-end robustness: it embeds watermarks covertly via generative text construction using coherent, trustworthy fictional knowledge (e.g., invented entities and their attributes); leverages LLMs’ inherent preference for memorizing commonsense narratives to enhance watermark retention; and supports controllable adjustment of watermark density, length, and attribute diversity. The watermark remains stable after continued pretraining and supervised fine-tuning, exhibits strong resilience against preprocessing-based filtering, and enables high-accuracy verification under pure API access via question-answering–based detection—achieving significantly higher accuracy than baselines. To our knowledge, this is the first framework that simultaneously ensures concealment, verifiability, and end-to-end robustness for protecting training data copyright in LLMs.

Technology Category

Application Category

📝 Abstract
Data watermarking in language models injects traceable signals, such as specific token sequences or stylistic patterns, into copyrighted text, allowing copyright holders to track and verify training data ownership. Previous data watermarking techniques primarily focus on effective memorization after pretraining, while overlooking challenges that arise in other stages of the LLM pipeline, such as the risk of watermark filtering during data preprocessing, or potential forgetting through post-training, or verification difficulties due to API-only access. We propose a novel data watermarking approach that injects coherent and plausible yet fictitious knowledge into training data using generated passages describing a fictitious entity and its associated attributes. Our watermarks are designed to be memorized by the LLM through seamlessly integrating in its training data, making them harder to detect lexically during preprocessing.We demonstrate that our watermarks can be effectively memorized by LLMs, and that increasing our watermarks' density, length, and diversity of attributes strengthens their memorization. We further show that our watermarks remain robust throughout LLM development, maintaining their effectiveness after continual pretraining and supervised finetuning. Finally, we show that our data watermarks can be evaluated even under API-only access via question answering.
Problem

Research questions and friction points this paper is trying to address.

Enhances data watermarking robustness in language models.
Addresses challenges in preprocessing, post-training, and API-only access.
Injects fictitious knowledge to ensure traceable and verifiable data ownership.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Injecting fictitious knowledge into training data
Robust watermarking through diverse attribute integration
Verification via API-only access using question answering
🔎 Similar Papers