HATS: High-Accuracy Triple-Set Watermarking for Large Language Models

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address copyright attribution and misuse prevention for large language model (LLM)-generated text, this paper proposes a highly robust ternary vocabulary partitioning watermarking method. Unlike conventional binary partitioning, our approach dynamically divides the vocabulary into Green, Yellow, and Red subsets at each decoding step. Watermark embedding is achieved via joint modeling of two complementary statistics—Green-enrichment and Red-depletion—while detection employs an end-to-end framework integrating z-score thresholding and Fisher’s combined p-value test. Evaluated on Llama 2-7B, the method achieves significantly higher true positive rates under strict false positive control (<1%), outperforming state-of-the-art binary watermarking schemes, while preserving textual fluency and linguistic quality. The core contributions are the first introduction of a ternary vocabulary partitioning mechanism and a novel dual-statistic collaborative detection framework.

Technology Category

Application Category

📝 Abstract

Misuse of LLM-generated text can be curbed by watermarking techniques that embed implicit signals into the output. We propose a watermark that partitions the vocabulary at each decoding step into three sets (Green/Yellow/Red) with fixed ratios and restricts sampling to the Green and Yellow sets. At detection time, we replay the same partitions, compute Green-enrichment and Red-depletion statistics, convert them to one-sided z-scores, and aggregate their p-values via Fisher's method to decide whether a passage is watermarked. We implement generation, detection, and testing on Llama 2 7B, and evaluate true-positive rate, false-positive rate, and text quality. Results show that the triple-partition scheme achieves high detection accuracy at fixed FPR while preserving readability.

Problem

Research questions and friction points this paper is trying to address.

Develops a triple-set watermarking method for LLMs

Detects misuse of AI-generated text via statistical analysis

Ensures high accuracy and preserves text quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Triple-set vocabulary partitioning for watermarking

Green-enrichment and Red-depletion statistical detection

Fisher's method aggregation of p-values for decision

🔎 Similar Papers

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models