SLIM: Stealthy Low-Coverage Black-Box Watermarking via Latent-Space Confusion Zones

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing watermarking methods for large language models struggle to simultaneously achieve stealthiness, verifiability, and robustness under extremely low data coverage. This work proposes SLIM, a novel approach that leverages the inherent instability of language models—where semantically similar prefixes yield significantly divergent continuations—to construct latent-space ambiguity regions for user-level data provenance in a black-box setting. SLIM is the first method to enable effective watermark verification at ultra-low coverage rates, such as a single training sample, without requiring white-box access. It maintains high stealthiness, preserves model utility, and scales efficiently. Experimental results demonstrate that SLIM retains strong verification capability even when only a minimal number of training samples are modified, establishing a new paradigm for protecting training data in large language models.

Technology Category

Application Category

📝 Abstract
Training data is a critical and often proprietary asset in Large Language Model (LLM) development, motivating the use of data watermarking to embed model-transferable signals for usage verification. We identify low coverage as a vital yet largely overlooked requirement for practicality, as individual data owners typically contribute only a minute fraction of massive training corpora. Prior methods fail to maintain stealthiness, verification feasibility, or robustness when only one or a few sequences can be modified. To address these limitations, we introduce SLIM, a framework enabling per-user data provenance verification under strict black-box access. SLIM leverages intrinsic LLM properties to induce a Latent-Space Confusion Zone by training the model to map semantically similar prefixes to divergent continuations. This manifests as localized generation instability, which can be reliably detected via hypothesis testing. Experiments demonstrate that SLIM achieves ultra-low coverage capability, strong black-box verification performance, and great scalability while preserving both stealthiness and model utility, offering a robust solution for protecting training data in modern LLM pipelines.
Problem

Research questions and friction points this paper is trying to address.

watermarking
low coverage
black-box verification
data provenance
stealthiness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent-Space Confusion Zone
Low-Coverage Watermarking
Black-Box Verification
Data Provenance
Generation Instability
🔎 Similar Papers
No similar papers found.