From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing watermarking methods for Embeddings-as-a-Service (EaaS) overlook semantic properties of embeddings, resulting in poor harmlessness, low imperceptibility, and embedding distribution distortion. To address this, we propose the first semantic-aware watermarking framework. Methodologically: (i) we design an adaptive watermark weighting mechanism based on Local Outlier Factor (LOF); (ii) we introduce LSH-driven semantic space partitioning to enable localized watermark injection; and (iii) we establish a joint evaluation framework integrating Detect-Sampling and dimensionality-reduction attacks. Experiments across four mainstream NLP datasets demonstrate significant improvements in verifiability, imperceptibility, harmlessness, and diversity: watermark signals are human-imperceptible, while the original embedding’s statistical distribution and downstream task performance are strictly preserved.

Technology Category

Application Category

📝 Abstract

Benefiting from the superior capabilities of large language models in natural language understanding and generation, Embeddings-as-a-Service (EaaS) has emerged as a successful commercial paradigm on the web platform. However, prior studies have revealed that EaaS is vulnerable to imitation attacks. Existing methods protect the intellectual property of EaaS through watermarking techniques, but they all ignore the most important properties of embedding: semantics, resulting in limited harmlessness and stealthiness. To this end, we propose SemMark, a novel semantic-based watermarking paradigm for EaaS copyright protection. SemMark employs locality-sensitive hashing to partition the semantic space and inject semantic-aware watermarks into specific regions, ensuring that the watermark signals remain imperceptible and diverse. In addition, we introduce the adaptive watermark weight mechanism based on the local outlier factor to preserve the original embedding distribution. Furthermore, we propose Detect-Sampling and Dimensionality-Reduction attacks and construct four scenarios to evaluate the watermarking method. Extensive experiments are conducted on four popular NLP datasets, and SemMark achieves superior verifiability, diversity, stealthiness, and harmlessness.

Problem

Research questions and friction points this paper is trying to address.

Protects EaaS from imitation attacks via semantic-aware watermarking

Ensures watermarks are imperceptible and preserve embedding distribution

Evaluates method against novel attacks across diverse NLP datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-aware watermarking using locality-sensitive hashing

Adaptive watermark weight mechanism based on local outlier factor

Evaluation via Detect-Sampling and Dimensionality-Reduction attack scenarios

🔎 Similar Papers

No similar papers found.

Authors to Follow