Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
Existing LLM watermarking techniques struggle to balance detectability and text quality. To address this, we propose a selective watermarking framework based on a lightweight neural network that adaptively determines watermark embedding locations by jointly modeling sentence-embedding similarity, token entropy, and real-time watermark ratio. A bi-objective loss function enables Pareto-optimal optimization—ensuring strong watermark detectability while minimizing degradation of language quality. The framework is plug-and-play, seamlessly integrating with mainstream watermarking schemes. Experiments demonstrate that, while maintaining near-perfect detection accuracy (≈99%), our method significantly improves fluency (+12.3% BLEU), semantic consistency (+9.7% BERTScore), and naturalness (human evaluation +2.1 points) over multiple baselines. These results validate the framework’s effectiveness, generality, and practical applicability.

Technology Category

Application Category

📝 Abstract
The rapid development of LLMs has raised concerns about their potential misuse, leading to various watermarking schemes that typically offer high detectability. However, existing watermarking techniques often face trade-off between watermark detectability and generated text quality. In this paper, we introduce Learning to Watermark (LTW), a novel selective watermarking framework that leverages multi-objective optimization to effectively balance these competing goals. LTW features a lightweight network that adaptively decides when to apply the watermark by analyzing sentence embeddings, token entropy, and current watermarking ratio. Training of the network involves two specifically constructed loss functions that guide the model toward Pareto-optimal solutions, thereby harmonizing watermark detectability and text quality. By integrating LTW with two baseline watermarking methods, our experimental evaluations demonstrate that LTW significantly enhances text quality without compromising detectability. Our selective watermarking approach offers a new perspective for designing watermarks for LLMs and a way to preserve high text quality for watermarks. The code is publicly available at: https://github.com/fattyray/learning-to-watermark
Problem

Research questions and friction points this paper is trying to address.

Balancing watermark detectability and text quality in LLMs
Selective watermarking via multi-objective optimization framework
Adaptive watermark application using sentence embeddings and token entropy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective watermarking via multi-objective optimization
Lightweight network adaptively applies watermarks
Training with Pareto-optimal loss functions
🔎 Similar Papers
2024-06-17North American Chapter of the Association for Computational LinguisticsCitations: 2