PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing methods for copyright verification on multimodal datasets face a fundamental trade-off: invasive schemes degrade model accuracy, while non-invasive approaches rely on label boundaries and suffer from poor stability. This paper proposes the first training-agnostic, non-invasive multimodal fingerprinting mechanism—requiring neither fine-tuning nor access to original training data. Its core innovation lies in leveraging intrinsic cross-modal data distributions as fingerprint sources, combined with globally optimal perturbations (GOP) and adaptive prompt engineering to model cross-modal drift in embedding space, followed by proxy-model-based realignment for robust verification. The method enables, for the first time, high-transferability copyright attribution in cross-modal retrieval scenarios. Evaluated on mainstream architectures, it achieves a 30% improvement in detection accuracy over state-of-the-art methods, with zero accuracy degradation, strong generalization across models and datasets, and superior robustness against common transformations.

Technology Category

Application Category

📝 Abstract

The multimodal datasets can be leveraged to pre-train large-scale vision-language models by providing cross-modal semantics. Current endeavors for determining the usage of datasets mainly focus on single-modal dataset ownership verification through intrusive methods and non-intrusive techniques, while cross-modal approaches remain under-explored. Intrusive methods can adapt to multimodal datasets but degrade model accuracy, while non-intrusive methods rely on label-driven decision boundaries that fail to guarantee stable behaviors for verification. To address these issues, we propose a novel prompt-adapted transferable fingerprinting scheme from a training-free perspective, called PATFinger, which incorporates the global optimal perturbation (GOP) and the adaptive prompts to capture dataset-specific distribution characteristics. Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers. The GOP is derived from the sample distribution to maximize embedding drifts between different modalities. Subsequently, our PATFinger re-aligns the adaptive prompt with GOP samples to capture the cross-modal interactions on the carefully crafted surrogate model. This allows the dataset owner to check the usage of datasets by observing specific prediction behaviors linked to the PATFinger during retrieval queries. Extensive experiments demonstrate the effectiveness of our scheme against unauthorized multimodal dataset usage on various cross-modal retrieval architectures by 30% over state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

Verifying unauthorized multimodal dataset usage

Overcoming accuracy loss in intrusive verification methods

Ensuring stable cross-modal dataset ownership verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free fingerprinting with global optimal perturbation

Adaptive prompts capture cross-modal interactions

Utilizes dataset attributes as verification fingerprints

🔎 Similar Papers

EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations