Multi-use LLM Watermarking and the False Detection Problem

๐Ÿ“… 2025-06-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In multi-user scenarios, reusing large language model (LLM) watermarks leads to a sharp increase in false-positive rates as the number of users grows. Method: This paper proposes Dual Watermarkingโ€”a novel mechanism that jointly embeds a detection watermark and a user-identification watermark within a single text generation, enabling decoupled yet co-encoded watermarking. It integrates joint encoding under sampling-space constraints with statistical hypothesis testing. Contribution/Results: For the first time, it theoretically analyzes the root cause of false positives induced by watermark reuse through information-theoretic modeling and probabilistic analysis, breaking the traditional trade-off between detection reliability and user identifiability. Extensive evaluation across multiple LLMs and datasets demonstrates a 92% reduction in false-positive rate, >99% true detection accuracy, and precise, traceable user attribution.

Technology Category

Application Category

๐Ÿ“ Abstract
Digital watermarking is a promising solution for mitigating some of the risks arising from the misuse of automatically generated text. These approaches either embed non-specific watermarks to allow for the detection of any text generated by a particular sampler, or embed specific keys that allow the identification of the LLM user. However, simultaneously using the same embedding for both detection and user identification leads to a false detection problem, whereby, as user capacity grows, unwatermarked text is increasingly likely to be falsely detected as watermarked. Through theoretical analysis, we identify the underlying causes of this phenomenon. Building on these insights, we propose Dual Watermarking which jointly encodes detection and identification watermarks into generated text, significantly reducing false positives while maintaining high detection accuracy. Our experimental results validate our theoretical findings and demonstrate the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Mitigating misuse risks of LLM-generated text via watermarking
Solving false detection in multi-use watermarking systems
Balancing detection accuracy and user identification capacity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Watermarking combines detection and identification
Theoretical analysis identifies false detection causes
Reduces false positives with high accuracy
๐Ÿ”Ž Similar Papers
No similar papers found.