Towards Watermarking of Open-Source LLMs

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the fragility of watermarks for open-source large language models (LLMs) under common model modifications—including merging, quantization, and fine-tuning—where existing watermarks frequently fail. To tackle this, we formally define “durability” as a core requirement for LLM watermarks and introduce the first standardized, open-scenario watermark robustness evaluation framework. Methodologically, we propose a systematic robustness analysis paradigm that simulates realistic model modification pipelines, enabling cross-method empirical evaluation and failure root-cause analysis. Experiments reveal that state-of-the-art watermarking schemes exhibit widespread failure across diverse modifications, demonstrating critically insufficient durability. Our key contributions are: (1) establishing durability as a foundational metric for open-source LLM watermarks; (2) releasing the first reproducible, benchmarked evaluation suite; and (3) identifying principal failure mechanisms, along with a challenge taxonomy and design principles to guide future durable watermark research.

Technology Category

Application Category

📝 Abstract

While watermarks for closed LLMs have matured and have been included in large-scale deployments, these methods are not applicable to open-source models, which allow users full control over the decoding process. This setting is understudied yet critical, given the rising performance of open-source models. In this work, we lay the foundation for systematic study of open-source LLM watermarking. For the first time, we explicitly formulate key requirements, including durability against common model modifications such as model merging, quantization, or finetuning, and propose a concrete evaluation setup. Given the prevalence of these modifications, durability is crucial for an open-source watermark to be effective. We survey and evaluate existing methods, showing that they are not durable. We also discuss potential ways to improve their durability and highlight remaining challenges. We hope our work enables future progress on this important problem.

Problem

Research questions and friction points this paper is trying to address.

Watermarking open-source large language models

Durability against model modifications

Evaluation setup for open-source watermarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed open-source LLM watermarking

Evaluated durability against modifications

Highlighted challenges for future research

🔎 Similar Papers

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models

2024-06-17North American Chapter of the Association for Computational LinguisticsCitations: 2

Authors to Follow