🤖 AI Summary
This study addresses critical vulnerabilities in contrastive learning models, which rely on third-party data and are thus susceptible to data poisoning backdoor attacks. Existing attack methods suffer from limited adaptability, low success rates, poor transferability, and a lack of mechanisms for intellectual property (IP) protection of datasets in this context. The work systematically evaluates the limitations of current backdoor attacks in contrastive learning and innovatively repurposes their typically weak effects into reliable watermarking signals. It proposes a statistical verification method based on a unified density metric and a multi-level watermarking mechanism supporting feature-level embeddings as well as soft and hard label outputs. Experimental results demonstrate that the proposed approach achieves a strong balance among fidelity, verifiability, and robustness, offering a practical solution for dataset IP protection in contrastive learning.
📝 Abstract
Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible, reliance on third-party or internet data is common. Recent studies show CL models are vulnerable to data-poisoning backdoor attacks, but their generalization and robustness are underexplored. We systematically evaluate existing data-poisoning backdoor attacks on CL, revealing limitations: poor dataset adaptability, low success rates, limited portability, and restrictive assumptions (e.g., downstream task knowledge). Interestingly, trigger samples exhibit distinguishable statistical divergence from clean samples, which inspires repurposing it as a watermark for dataset IP protection. Direct repurposing is challenging due to low success rates; we overcome this by statistical verification using a unified density metric. We further propose a multi-level watermarking scheme adapting to feature-level, soft-label, or hard-label outputs in CL. Experiments show some backdoor attacks can be repurposed as effective watermarks with trade-offs among fidelity, verifiability, and robustness. This work demonstrates weak backdoor effects become reliable signals for dataset IP protection in challenging CL settings.