Functional Subspace Watermarking for Large Language Models

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the vulnerability of existing large language model (LLM) watermarking methods to parameter-level perturbations such as fine-tuning, quantization, or distillation, which often hinder reliable ownership verification. To overcome this limitation, the authors propose a functional subspace watermarking framework that constructs a low-dimensional, semantics-preserving subspace via generalized eigenvalue decomposition. The method innovatively integrates adaptive spectral truncation with vector consistency constraints to enable robust watermark embedding and detection. Extensive experiments across multiple mainstream LLMs and datasets demonstrate that the proposed approach significantly outperforms current techniques, achieving substantially enhanced robustness against parameter perturbations while preserving model utility and providing statistically verifiable watermark detection.

Technology Category

Application Category

📝 Abstract

Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.

Problem

Research questions and friction points this paper is trying to address.

model watermarking

large language models

robustness

parameter perturbations

ownership protection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Functional Subspace Watermarking

robustness

generalized eigenvalue problem