VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the limitations of existing large language model watermark detection methods, which rely on centralized trust, require users to submit sensitive text, and offer no guarantee of result integrity. Framing the problem as a secure two-party computation task, the paper proposes the first practical solution for short texts that integrates verifiable oblivious pseudorandom functions (VOPRFs) to achieve privacy-preserving, verifiable, and efficient watermark detection. The proposed protocol enables detection without revealing the original text to the verifier, maintains high usability, and demonstrates enhanced robustness against modern paraphrasing attacks, thereby offering a revised assessment of watermarking efficacy under adversarial editing scenarios.
📝 Abstract
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.
Problem

Research questions and friction points this paper is trying to address.

LLM watermarking
privacy-preserving detection
verifiable detection
centralized trust model
short text
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifiable Oblivious Pseudorandom Function
secure two-party computation
privacy-preserving watermarking
LLM watermark detection
cryptographic verifiability
🔎 Similar Papers
2024-06-17North American Chapter of the Association for Computational LinguisticsCitations: 2