š¤ AI Summary
This work addresses watermarking for provenance tracking and copyright protection of LLM-generated text, tackling the critical vulnerability of existing watermarks to detection and removal. We propose a novel dual-scenario watermarking framework: statistically undetectable in closed-world settings and provably robust against removal in open-world settingsāleveraging the statisticalācomputational gap. Methodologically, we introduce the first integration of probabilistic mixture modeling, token-level lightweight embedding, statistical hypothesis testing, and the Learning With Errors (LWE) hardness assumption. Evaluated on mainstream models including GPT-4 and Llama, our watermark is imperceptible to both human readers and automated detectors, while achieving over 99.2% retention under strong sanitization attacksāincluding model distillation and synonym substitutionāsignificantly outperforming state-of-the-art approaches.
š Abstract
Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.