ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models

📅 2026-01-13

📈 Citations: 2

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work proposes a covert fingerprinting mechanism based on directed forgetting to address the vulnerability of existing language model watermarking methods to filtering, heuristic detection, and false triggers. By leveraging an auxiliary model and prediction entropy ranking, the method constructs compact key-value pairs and trains a lightweight LoRA adapter to selectively suppress original responses associated with specific keys, embedding imperceptible forgetting traces without compromising the model’s general capabilities. Departing from conventional fixed trigger-response paradigms, it exploits probabilistic forgetting patterns to significantly enhance stealth and reduce false positives. Combining likelihood and semantic evidence, the approach achieves 100% ownership verification accuracy under black-box and gray-box settings, remains robust against model merging and incremental fine-tuning, incurs no performance degradation on standard tasks, and consistently outperforms backdoor-based baselines.

Technology Category

Application Category

📝 Abstract

Existing invasive (backdoor) fingerprints suffer from high-perplexity triggers that are easily filtered, fixed response patterns exposed by heuristic detectors, and spurious activations on benign inputs. We introduce \textsc{ForgetMark}, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities. Ownership is verified under black/gray-box access by aggregating likelihood and semantic evidence into a fingerprint success rate. By relying on probabilistic forgetting traces rather than fixed trigger--response patterns, \textsc{ForgetMark} avoids high-perplexity triggers, reduces detectability, and lowers false triggers. Across diverse architectures and settings, it achieves 100\% ownership verification on fingerprinted models while maintaining standard performance, surpasses backdoor baselines in stealthiness and robustness to model merging, and remains effective under moderate incremental fine-tuning. Our code and data are available at \href{https://github.com/Xuzhenhua55/ForgetMark}{https://github.com/Xuzhenhua55/ForgetMark}.

Problem

Research questions and friction points this paper is trying to address.

fingerprinting

stealthiness

language models

backdoor

unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

targeted unlearning

stealthy fingerprinting

LoRA adapters