LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of detecting unauthorized reuse of training data in large language models (LLMs), this paper proposes a semantics-preserving, lexical-level watermarking method. It selects high-information keywords based on word entropy and embeds watermarks via context-aware synonym substitution, enhancing the model’s memory bias toward watermarked text without altering semantic meaning. The approach ensures strong robustness against data cleaning, fine-tuning, and cross-paradigm transfer. Evaluated on seven mainstream open-source LLMs—including LLaMA, Mistral, and Pythia series—the method achieves significantly higher AUROC for watermark detection than state-of-the-art alternatives. It enables high-accuracy, low-interference membership inference while simultaneously satisfying three critical requirements: invisibility, resilience to removal, and verification reliability—marking the first semantic watermarking mechanism to jointly guarantee all three properties.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) can be trained or fine-tuned on data obtained without the owner's consent. Verifying whether a specific LLM was trained on particular data instances or an entire dataset is extremely challenging. Dataset watermarking addresses this by embedding identifiable modifications in training data to detect unauthorized use. However, existing methods often lack stealth, making them relatively easy to detect and remove. In light of these limitations, we propose LexiMark, a novel watermarking technique designed for text and documents, which embeds synonym substitutions for carefully selected high-entropy words. Our method aims to enhance an LLM's memorization capabilities on the watermarked text without altering the semantic integrity of the text. As a result, the watermark is difficult to detect, blending seamlessly into the text with no visible markers, and is resistant to removal due to its subtle, contextually appropriate substitutions that evade automated and manual detection. We evaluated our method using baseline datasets from recent studies and seven open-source models: LLaMA-1 7B, LLaMA-3 8B, Mistral 7B, Pythia 6.9B, as well as three smaller variants from the Pythia family (160M, 410M, and 1B). Our evaluation spans multiple training settings, including continued pretraining and fine-tuning scenarios. The results demonstrate significant improvements in AUROC scores compared to existing methods, underscoring our method's effectiveness in reliably verifying whether unauthorized watermarked data was used in LLM training.
Problem

Research questions and friction points this paper is trying to address.

Detect unauthorized use of LLM training data via watermarking
Enhance watermark stealth using lexical substitutions
Improve membership verification robustness in LLM training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lexical substitutions for high-entropy words
Enhances memorization without altering semantics
Resistant to detection and removal
🔎 Similar Papers
No similar papers found.
E
Eyal German
Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Israel
S
Sagiv Antebi
Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Israel
Edan Habler
Edan Habler
Phd, Ben Gurion University
AiMLadversarial AIGenerative AIAviation
A
A. Shabtai
Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Israel
Y
Y. Elovici
Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Israel