Vision-Language Model IP Protection via Prompt-based Learning

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address intellectual property (IP) leakage caused by unauthorized cross-domain deployment of vision-language models (e.g., CLIP), this paper proposes a lightweight, prompt-learning–driven IP protection framework. Methodologically, it freezes the CLIP visual backbone—requiring no fine-tuning—while introducing three key innovations: (1) the first style-content joint IP prompting mechanism; (2) a style-enhancement branch coupled with a cross-domain feature bank to enable self-enhancement and feature disentanglement; and (3) three customized evaluation metrics that precisely quantify the trade-off between authorized and unauthorized domain performance. Empirically, the method achieves minimal degradation (<1.2% accuracy drop) on authorized domains while reducing recognition accuracy on unauthorized domains to near-random levels (≈1/N). Comprehensive experiments across diverse scenarios validate its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) like CLIP (Contrastive Language-Image Pre-Training) have seen remarkable success in visual recognition, highlighting the increasing need to safeguard the intellectual property (IP) of well-trained models. Effective IP protection extends beyond ensuring authorized usage; it also necessitates restricting model deployment to authorized data domains, particularly when the model is fine-tuned for specific target domains. However, current IP protection methods often rely solely on the visual backbone, which may lack sufficient semantic richness. To bridge this gap, we introduce IP-CLIP, a lightweight IP protection strategy tailored to CLIP, employing a prompt-based learning approach. By leveraging the frozen visual backbone of CLIP, we extract both image style and content information, incorporating them into the learning of IP prompt. This strategy acts as a robust barrier, effectively preventing the unauthorized transfer of features from authorized domains to unauthorized ones. Additionally, we propose a style-enhancement branch that constructs feature banks for both authorized and unauthorized domains. This branch integrates self-enhanced and cross-domain features, further strengthening IP-CLIP's capability to block features from unauthorized domains. Finally, we present new three metrics designed to better balance the performance degradation of authorized and unauthorized domains. Comprehensive experiments in various scenarios demonstrate its promising potential for application in IP protection tasks for VLMs.

Problem

Research questions and friction points this paper is trying to address.

Protecting intellectual property of vision-language models

Restricting model deployment to authorized data domains

Preventing unauthorized transfer of features across domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-based learning for VLM IP protection

Style-enhancement branch for feature domain control

New metrics balancing authorized and unauthorized performance

🔎 Similar Papers

ModelShield: Adaptive and Robust Watermark against Model Extraction Attack