PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for copyright verification on multimodal datasets face a fundamental trade-off: invasive schemes degrade model accuracy, while non-invasive approaches rely on label boundaries and suffer from poor stability. This paper proposes the first training-agnostic, non-invasive multimodal fingerprinting mechanism—requiring neither fine-tuning nor access to original training data. Its core innovation lies in leveraging intrinsic cross-modal data distributions as fingerprint sources, combined with globally optimal perturbations (GOP) and adaptive prompt engineering to model cross-modal drift in embedding space, followed by proxy-model-based realignment for robust verification. The method enables, for the first time, high-transferability copyright attribution in cross-modal retrieval scenarios. Evaluated on mainstream architectures, it achieves a 30% improvement in detection accuracy over state-of-the-art methods, with zero accuracy degradation, strong generalization across models and datasets, and superior robustness against common transformations.

Technology Category

Application Category

📝 Abstract
The multimodal datasets can be leveraged to pre-train large-scale vision-language models by providing cross-modal semantics. Current endeavors for determining the usage of datasets mainly focus on single-modal dataset ownership verification through intrusive methods and non-intrusive techniques, while cross-modal approaches remain under-explored. Intrusive methods can adapt to multimodal datasets but degrade model accuracy, while non-intrusive methods rely on label-driven decision boundaries that fail to guarantee stable behaviors for verification. To address these issues, we propose a novel prompt-adapted transferable fingerprinting scheme from a training-free perspective, called PATFinger, which incorporates the global optimal perturbation (GOP) and the adaptive prompts to capture dataset-specific distribution characteristics. Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers. The GOP is derived from the sample distribution to maximize embedding drifts between different modalities. Subsequently, our PATFinger re-aligns the adaptive prompt with GOP samples to capture the cross-modal interactions on the carefully crafted surrogate model. This allows the dataset owner to check the usage of datasets by observing specific prediction behaviors linked to the PATFinger during retrieval queries. Extensive experiments demonstrate the effectiveness of our scheme against unauthorized multimodal dataset usage on various cross-modal retrieval architectures by 30% over state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Verifying unauthorized multimodal dataset usage
Overcoming accuracy loss in intrusive verification methods
Ensuring stable cross-modal dataset ownership verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free fingerprinting with global optimal perturbation
Adaptive prompts capture cross-modal interactions
Utilizes dataset attributes as verification fingerprints
🔎 Similar Papers
No similar papers found.
W
Wenyi Zhang
Southeast University, Nanjing, China
J
Ju Jia
Southeast University, Nanjing, China
Xiaojun Jia
Xiaojun Jia
Nanyang Technological University
Explainable AIRobust AIEfficient AI
Y
Yihao Huang
Nanyang Technological University, Singapore
X
Xinfeng Li
Nanyang Technological University, Singapore
C
Cong Wu
University of Hong Kong, Hong Kong
Lina Wang
Lina Wang
Professor, Wuhan University
Computer Security