DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Verifying ownership of large language models (LLMs) in black-box settings remains challenging due to the lack of access to model internals and the need to avoid perturbing generated text. Method: This paper proposes a two-tier fingerprinting mechanism that operates without white-box access or generation interference: it jointly extracts input-trigger patterns and knowledge-level fingerprints, leveraging trigger-response analysis and knowledge-distillation discrepancy modeling to construct an ROC-driven discriminative learning framework. Contribution/Results: Unlike prior watermarking or fingerprinting methods reliant on white-box access or high-quality text generation, our approach achieves superior robustness against fine-tuning, quantization, and safety alignment. It attains IP-ROC > 0.95 across diverse model variants, enabling accurate attribution to the original base model. To our knowledge, this is the first practical, black-box–compatible solution for LLM intellectual property verification.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are considered valuable Intellectual Properties (IP) for legitimate owners due to the enormous computational cost of training. It is crucial to protect the IP of LLMs from malicious stealing or unauthorized deployment. Despite existing efforts in watermarking and fingerprinting LLMs, these methods either impact the text generation process or are limited in white-box access to the suspect model, making them impractical. Hence, we propose DuFFin, a novel $ extbf{Du}$al-Level $ extbf{Fin}$gerprinting $ extbf{F}$ramework for black-box setting ownership verification. DuFFin extracts the trigger pattern and the knowledge-level fingerprints to identify the source of a suspect model. We conduct experiments on a variety of models collected from the open-source website, including four popular base models as protected LLMs and their fine-tuning, quantization, and safety alignment versions, which are released by large companies, start-ups, and individual users. Results show that our method can accurately verify the copyright of the base protected LLM on their model variants, achieving the IP-ROC metric greater than 0.95. Our code is available at https://github.com/yuliangyan0807/llm-fingerprint.

Problem

Research questions and friction points this paper is trying to address.

Protect LLMs IP from stealing or unauthorized deployment

Overcome limitations of existing watermarking and fingerprinting methods

Verify ownership in black-box settings using dual-level fingerprints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-level fingerprinting for LLM IP protection

Black-box ownership verification method

Trigger pattern and knowledge-level fingerprints extraction

🔎 Similar Papers

FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition