How Good is Post-Hoc Watermarking With Language Model Rephrasing?

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Addressing the growing need for provenance tracking and copyright protection of AI-generated content, this work tackles the challenge of embedding robust, imperceptible watermarks into pre-existing text while preserving semantic fidelity. Method: We propose a post-hoc text watermarking framework leveraging large language models (LLMs) to semantically preserve text rewriting, during which statistical watermarks are embedded via Gumbel-max sampling under nucleus (top-p) sampling. The framework integrates beam search decoding, multi-candidate generation, and entropy-based filtering for detection, with rigorous semantic fidelity evaluation to ensure rewriting quality. Contribution/Results: This is the first systematic study of computational resource allocation in such post-hoc watermarking. We find Gumbel-max outperforms prior methods under nucleus sampling; unexpectedly, smaller models exhibit superior watermark robustness on verifiable texts (e.g., code). Our approach achieves high detection rates and strong semantic consistency on open-domain book-length texts, significantly enhancing robustness, while explicitly characterizing limitations and optimization pathways for structured text.

Technology Category

Application Category

📝 Abstract

Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.

Problem

Research questions and friction points this paper is trying to address.

Post-hoc watermarking for AI-generated content traceability

Evaluating quality-detectability trade-offs in text rephrasing

Assessing watermarking effectiveness on diverse text types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-hoc watermarking rewrites text with LLMs for traceability

Gumbel-max scheme outperforms alternatives under nucleus sampling

Smaller models excel in watermarking verifiable text like code

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?