Perturb Your Data: Paraphrase-Guided Training Data Watermarking

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Detecting and attributing copyright-protected content in large language model (LLM) training data is extremely challenging due to its vanishingly low prevalence (<0.001%). Method: This paper proposes a pre-deployment, scalable, semantics-preserving watermarking method embedded prior to model release. It innovatively integrates LLM-driven controllable text paraphrasing with a dual-model probability alignment mechanism, ensuring watermark robustness throughout the full LLM training pipeline without inducing distributional shift. Contribution/Results: Watermark detection—performed via token-level probability difference testing—achieves a p-value over nine orders of magnitude lower than baseline methods, drastically improving detection accuracy and attribution reliability while maintaining negligible false-positive rates. To our knowledge, this is the first watermarking framework tailored for large-scale LLM training that simultaneously achieves high robustness, high sensitivity, and semantic fidelity—enabling practical, deployable copyright protection for training data.

Technology Category

Application Category

📝 Abstract

Training data detection is critical for enforcing copyright and data licensing, as Large Language Models (LLM) are trained on massive text corpora scraped from the internet. We present SPECTRA, a watermarking approach that makes training data reliably detectable even when it comprises less than 0.001% of the training corpus. SPECTRA works by paraphrasing text using an LLM and assigning a score based on how likely each paraphrase is, according to a separate scoring model. A paraphrase is chosen so that its score closely matches that of the original text, to avoid introducing any distribution shifts. To test whether a suspect model has been trained on the watermarked data, we compare its token probabilities against those of the scoring model. We demonstrate that SPECTRA achieves a consistent p-value gap of over nine orders of magnitude when detecting data used for training versus data not used for training, which is greater than all baselines tested. SPECTRA equips data owners with a scalable, deploy-before-release watermark that survives even large-scale LLM training.

Problem

Research questions and friction points this paper is trying to address.

Detect training data for copyright enforcement

Watermark data to survive large-scale LLM training

Identify suspect models using token probability comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

Paraphrases text using LLM for watermarking

Matches paraphrase score to original to avoid distribution shifts

Compares token probabilities to detect watermarked training data

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?