HATS: High-Accuracy Triple-Set Watermarking for Large Language Models

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address copyright attribution and misuse prevention for large language model (LLM)-generated text, this paper proposes a highly robust ternary vocabulary partitioning watermarking method. Unlike conventional binary partitioning, our approach dynamically divides the vocabulary into Green, Yellow, and Red subsets at each decoding step. Watermark embedding is achieved via joint modeling of two complementary statistics—Green-enrichment and Red-depletion—while detection employs an end-to-end framework integrating z-score thresholding and Fisher’s combined p-value test. Evaluated on Llama 2-7B, the method achieves significantly higher true positive rates under strict false positive control (<1%), outperforming state-of-the-art binary watermarking schemes, while preserving textual fluency and linguistic quality. The core contributions are the first introduction of a ternary vocabulary partitioning mechanism and a novel dual-statistic collaborative detection framework.

Technology Category

Application Category

📝 Abstract
Misuse of LLM-generated text can be curbed by watermarking techniques that embed implicit signals into the output. We propose a watermark that partitions the vocabulary at each decoding step into three sets (Green/Yellow/Red) with fixed ratios and restricts sampling to the Green and Yellow sets. At detection time, we replay the same partitions, compute Green-enrichment and Red-depletion statistics, convert them to one-sided z-scores, and aggregate their p-values via Fisher's method to decide whether a passage is watermarked. We implement generation, detection, and testing on Llama 2 7B, and evaluate true-positive rate, false-positive rate, and text quality. Results show that the triple-partition scheme achieves high detection accuracy at fixed FPR while preserving readability.
Problem

Research questions and friction points this paper is trying to address.

Develops a triple-set watermarking method for LLMs
Detects misuse of AI-generated text via statistical analysis
Ensures high accuracy and preserves text quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Triple-set vocabulary partitioning for watermarking
Green-enrichment and Red-depletion statistical detection
Fisher's method aggregation of p-values for decision
🔎 Similar Papers
No similar papers found.
Z
Zhiqing Hu
Institute of Computer Application, China Academy of Engineering Physics, Mianyang, China
C
Chenxu Zhao
Institute of Computer Application, China Academy of Engineering Physics, Mianyang, China
J
Jiazhong Lu
School of Cybersecurity (Xin Gu Industrial College), Chengdu University of Information Technology, Mianyang, China
Xiaolei Liu
Xiaolei Liu
National Interdisciplinary Research Center of Engineering Physics
Trustworthy AIData-driven SecurityPrivacy