🤖 AI Summary
To address copyright attribution and misuse prevention for large language model (LLM)-generated text, this paper proposes a highly robust ternary vocabulary partitioning watermarking method. Unlike conventional binary partitioning, our approach dynamically divides the vocabulary into Green, Yellow, and Red subsets at each decoding step. Watermark embedding is achieved via joint modeling of two complementary statistics—Green-enrichment and Red-depletion—while detection employs an end-to-end framework integrating z-score thresholding and Fisher’s combined p-value test. Evaluated on Llama 2-7B, the method achieves significantly higher true positive rates under strict false positive control (<1%), outperforming state-of-the-art binary watermarking schemes, while preserving textual fluency and linguistic quality. The core contributions are the first introduction of a ternary vocabulary partitioning mechanism and a novel dual-statistic collaborative detection framework.
📝 Abstract
Misuse of LLM-generated text can be curbed by watermarking techniques that embed implicit signals into the output. We propose a watermark that partitions the vocabulary at each decoding step into three sets (Green/Yellow/Red) with fixed ratios and restricts sampling to the Green and Yellow sets. At detection time, we replay the same partitions, compute Green-enrichment and Red-depletion statistics, convert them to one-sided z-scores, and aggregate their p-values via Fisher's method to decide whether a passage is watermarked. We implement generation, detection, and testing on Llama 2 7B, and evaluate true-positive rate, false-positive rate, and text quality. Results show that the triple-partition scheme achieves high detection accuracy at fixed FPR while preserving readability.