Locket: Robust Feature-Locking Technique for Language Models

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenges of fine-grained feature monetization in large language models—specifically, feature bypassing, credential misuse, and scalability across multi-user settings. We propose FLoTE, the first robust and scalable feature-locking technique. Its core is a dynamic, adapter-fusion–based access control mechanism: lightweight, policy-driven adapters are merged during inference to selectively enable or disable specific semantic capabilities—without modifying the base model. Experiments show FLoTE achieves 100% rejection rate for locked features, incurs ≤7% performance degradation on unlocked ones, reduces adversarial bypass success to <5%, and scales efficiently to large-scale multi-user, multi-feature deployments. Crucially, this work is the first to formalize feature-level locking as a verifiable, model-internal access control problem—establishing a secure, efficient, and auditable foundation for LLM commercialization.

Technology Category

Application Category

📝 Abstract

Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models for paying subscribers. However, a finer-grained pay-to-unlock scheme for premium features (e.g., math, coding) is thought to be more economically viable for the providers. Such a scheme requires a feature-locking technique (FLoTE) which is (i) effective in refusing locked features, (ii) utility-preserving for unlocked features, (iii) robust against evasion or unauthorized credential sharing, and (iv) scalable to multiple features and users. However, existing FLoTEs (e.g., password-locked models) are not robust or scalable. We present Locket, the first robust and scalable FLoTE to enable pay-to-unlock schemes. Locket uses a novel merging approach to attach adapters to an LLM for refusing unauthorized features. Our comprehensive evaluation shows that Locket is effective ($100$% refusal on locked features), utility-preserving ($leq 7$% utility degradation in unlocked features), robust ($leq 5$% attack success rate), and scales to multiple features and clients.

Problem

Research questions and friction points this paper is trying to address.

Developing robust feature-locking for premium model capabilities

Enabling pay-to-unlock schemes while preserving utility

Preventing evasion and unauthorized access to locked features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Merging adapters into LLMs for feature locking

Ensuring utility preservation in unlocked features

Providing robustness against evasion and credential sharing

🔎 Similar Papers

The Remarkable Robustness of LLMs: Stages of Inference?