Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words

πŸ“… 2025-03-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the security risk of unauthorized access to fine-tuned large language models (LLMs) due to API key leakage. We propose IdentityLock, an identity-bound wake-word-based proactive access control mechanism. Our approach deeply integrates wake-trigger logic into supervised fine-tuning: for 90% of training samples, we inject user-specific wake prefixes (e.g., β€œHey! [Model Name]!”), while constructing 10% rejection-response samples; training further employs robust multi-domain, multi-task supervision across agriculture, economics, healthcare, and law. Experiments show that under API key leakage, adversaries cannot activate core model functionality; under legitimate wake-word invocation, task performance remains nearly intact (average accuracy drop < 0.8%). To our knowledge, IdentityLock is the first fine-tuning paradigm that tightly couples wake words with user identity to enable fine-grained, proactive access control.

Technology Category

Application Category

πŸ“ Abstract
The rapid advancement of Large Language Models (LLMs) has increased the complexity and cost of fine-tuning, leading to the adoption of API-based fine-tuning as a simpler and more efficient alternative. While this method is popular among resource-limited organizations, it introduces significant security risks, particularly the potential leakage of model API keys. Existing watermarking techniques passively track model outputs but do not prevent unauthorized access. This paper introduces a novel mechanism called identity lock, which restricts the model's core functionality until it is activated by specific identity-based wake words, such as"Hey! [Model Name]!". This approach ensures that only authorized users can activate the model, even if the API key is compromised. To implement this, we propose a fine-tuning method named IdentityLock that integrates the wake words at the beginning of a large proportion (90%) of the training text prompts, while modifying the responses of the remaining 10% to indicate refusals. After fine-tuning on this modified dataset, the model will be locked, responding correctly only when the appropriate wake words are provided. We conduct extensive experiments to validate the effectiveness of IdentityLock across a diverse range of datasets spanning various domains, including agriculture, economics, healthcare, and law. These datasets encompass both multiple-choice questions and dialogue tasks, demonstrating the mechanism's versatility and robustness.
Problem

Research questions and friction points this paper is trying to address.

Prevent unauthorized access to API-fine-tuned LLMs.
Introduce identity-based wake words for model activation.
Enhance security by locking core functionality without wake words.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identity-based wake words for model activation
Fine-tuning with 90% wake word integration
Model locked until authorized wake words provided
πŸ”Ž Similar Papers
No similar papers found.
Hongyu Su
Hongyu Su
Department of Computer Science, Aalto University
machine learningoptimization research
Y
Yifeng Gao
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
Y
Yifan Ding
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI