Efficiently Identifying Watermarked Segments in Mixed-Source Texts

📅 2024-10-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of detecting watermarks localized within specific clauses or paragraphs in long documents with mixed provenance—where fine-grained localization remains difficult—this paper proposes the first geometric covering framework and adaptive online learning algorithm for clause- and paragraph-level watermark detection. Our method integrates geometric coverage modeling, sequential online learning, and a multi-watermark compatibility mechanism, enabling dynamic identification and precise localization of mainstream watermarking schemes including KGW, Unigram, and Gumbel. Evaluated on three watermark benchmarks, it achieves significantly higher accuracy than baselines and demonstrates strong generalization. Moving beyond conventional document-level binary classification, our approach establishes the first fine-grained, interpretable, and deployable AIGC watermark attribution capability, providing critical technical support for trustworthy content governance.

Technology Category

Application Category

📝 Abstract
Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection.
Problem

Research questions and friction points this paper is trying to address.

Detect watermarked segments in mixed-source texts
Locate precise positions of watermark segments
Improve accuracy over existing watermark detection methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry cover framework for watermark segment detection
Adaptive online learning for precise location
High accuracy across multiple watermark techniques
🔎 Similar Papers
No similar papers found.