Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the pressing challenge of copyright attribution in multimodal large language models (MLLMs) by proposing an adversarially guided dual-injection framework for robust and verifiable ownership protection. The method embeds learnable image tensors during model fine-tuning to encode specific copyright-triggering signals that elicit predefined ownership responses only in derivative models, while remaining ineffective in non-derivative ones. By integrating CLIP feature alignment, an auxiliary MLLM consistency loss, and adversarial training, the approach establishes a dual semantic injection mechanism that significantly enhances tracing robustness under aggressive fine-tuning and cross-domain transfer. Extensive experiments demonstrate that the proposed scheme reliably verifies model ownership with high accuracy across diverse fine-tuning protocols and domain-shift scenarios.

Technology Category

Application Category

📝 Abstract

With the rapid deployment and widespread adoption of multimodal large language models (MLLMs), disputes regarding model version attribution and ownership have become increasingly frequent, raising significant concerns about intellectual property protection. In this paper, we propose a framework for generating copyright triggers for MLLMs, enabling model publishers to embed verifiable ownership information into the model. The goal is to construct trigger images that elicit ownership-related textual responses exclusively in fine-tuned derivatives of the original model, while remaining inert in other non-derivative models. Our method constructs a tracking trigger image by treating the image as a learnable tensor, performing adversarial optimization with dual-injection of ownership-relevant semantic information. The first injection is achieved by enforcing textual consistency between the output of an auxiliary MLLM and a predefined ownership-relevant target text; the consistency loss is backpropagated to inject this ownership-related information into the image. The second injection is performed at the semantic-level by minimizing the distance between the CLIP features of the image and those of the target text. Furthermore, we introduce an additional adversarial training stage involving the auxiliary model derived from the original model itself. This auxiliary model is specifically trained to resist generating ownership-relevant target text, thereby enhancing robustness in heavily fine-tuned derivative models. Extensive experiments demonstrate the effectiveness of our dual-injection approach in tracking model lineage under various fine-tuning and domain-shift scenarios.

Problem

Research questions and friction points this paper is trying to address.

model ownership

multimodal large language models

intellectual property

model attribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language models

adversarial training

dual injection

ownership verification