SEAL: Subspace-Anchored Watermarks for LLM Ownership

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the poor robustness of existing watermarking methods against fine-tuning and knowledge distillation—and the inability of fingerprinting techniques to provide provable ownership—in large language model (LLM) intellectual property protection, this paper proposes a subspace-anchored watermarking framework. It is the first to embed multi-bit signatures into orthogonal subspaces of model hidden-layer representations, enabling verifiable watermark detection under both white-box and black-box settings via anchor-sample-driven subspace alignment and orthogonal vector encoding. The method achieves high imperceptibility, strong robustness, and minimal degradation in model functionality. Extensive experiments across six mainstream LLMs and multiple benchmarks demonstrate its superiority over 11 state-of-the-art baselines, achieving high watermark accuracy and maintaining reliable detection performance under diverse adversarial attacks, including knowledge distillation and fine-tuning.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level performance in text generation, reasoning, and question answering. However, training such models requires substantial computational resources, large curated datasets, and sophisticated alignment procedures. As a result, they constitute highly valuable intellectual property (IP) assets that warrant robust protection mechanisms. Existing IP protection approaches suffer from critical limitations. Model fingerprinting techniques can identify model architectures but fail to establish ownership of specific model instances. In contrast, traditional backdoor-based watermarking methods embed behavioral anomalies that can be easily removed through common post-processing operations such as fine-tuning or knowledge distillation. We propose SEAL, a subspace-anchored watermarking framework that embeds multi-bit signatures directly into the model's latent representational space, supporting both white-box and black-box verification scenarios. Our approach leverages model editing techniques to align the hidden representations of selected anchor samples with predefined orthogonal bit vectors. This alignment embeds the watermark while preserving the model's original factual predictions, rendering the watermark functionally harmless and stealthy. We conduct comprehensive experiments on multiple benchmark datasets and six prominent LLMs, comparing SEAL with 11 existing fingerprinting and watermarking methods to demonstrate its superior effectiveness, fidelity, efficiency, and robustness. Furthermore, we evaluate SEAL under potential knowledgeable attacks and show that it maintains strong verification performance even when adversaries possess knowledge of the watermarking mechanism and the embedded signatures.

Problem

Research questions and friction points this paper is trying to address.

Protecting intellectual property of large language models from unauthorized use

Embedding robust watermarks resistant to removal through fine-tuning or distillation

Developing ownership verification that works in both white-box and black-box scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embeds watermarks in latent representational space

Aligns anchor samples with orthogonal bit vectors

Preserves model predictions while enabling verification

🔎 Similar Papers

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models