Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

LLM watermarking faces an inherent trade-off between erasure robustness and forgery resistance: small watermark windows are vulnerable to reverse-engineering and forgery attacks, whereas large windows are susceptible to erasure. This paper proposes Subword-Table Decomposed Equivalent Texture Keys (SEEK), a novel watermarking framework that breaks this trade-off by introducing multiple independent token-level detection mechanisms within a single watermark window. SEEK leverages statistical indistinguishability and subword-table decomposition to achieve both key redundancy and texture equivalence. It is the first method to significantly enhance erasure robustness without compromising forgery resistance. Experiments across diverse data settings demonstrate that SEEK improves forgery resistance by 88.2%–92.3% and erasure robustness by 6.4%–24.6%, consistently outperforming state-of-the-art approaches and achieving Pareto-optimal improvements.

Technology Category

Application Category

📝 Abstract

Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work breaks this trade-off by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a novel watermark scheme with Sub-vocabulary decomposed Equivalent tExture Key (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing. Experiments demonstrate SEEK's superiority over prior method, yielding spoofing robustness gains of +88.2%/+92.3%/+82.0% and scrubbing robustness gains of +10.2%/+6.4%/+24.6% across diverse dataset settings.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM watermark resilience against scrubbing and spoofing attacks

Breaking trade-off between watermark window size and attack vulnerability

Proposing SEEK for improved scrubbing and spoofing robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivalent texture keys enable independent token detection

Sub-vocabulary decomposed watermark scheme (SEEK) introduced

Pareto improvement in scrubbing and spoofing resilience

🔎 Similar Papers

Discovering Spoofing Attempts on Language Model Watermarks