Repurposing and Evaluating the (In)Feasibility of Dataset Poisoning enabled Watermarking for Contrastive Learning

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This study addresses critical vulnerabilities in contrastive learning models, which rely on third-party data and are thus susceptible to data poisoning backdoor attacks. Existing attack methods suffer from limited adaptability, low success rates, poor transferability, and a lack of mechanisms for intellectual property (IP) protection of datasets in this context. The work systematically evaluates the limitations of current backdoor attacks in contrastive learning and innovatively repurposes their typically weak effects into reliable watermarking signals. It proposes a statistical verification method based on a unified density metric and a multi-level watermarking mechanism supporting feature-level embeddings as well as soft and hard label outputs. Experimental results demonstrate that the proposed approach achieves a strong balance among fidelity, verifiability, and robustness, offering a practical solution for dataset IP protection in contrastive learning.
📝 Abstract
Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible, reliance on third-party or internet data is common. Recent studies show CL models are vulnerable to data-poisoning backdoor attacks, but their generalization and robustness are underexplored. We systematically evaluate existing data-poisoning backdoor attacks on CL, revealing limitations: poor dataset adaptability, low success rates, limited portability, and restrictive assumptions (e.g., downstream task knowledge). Interestingly, trigger samples exhibit distinguishable statistical divergence from clean samples, which inspires repurposing it as a watermark for dataset IP protection. Direct repurposing is challenging due to low success rates; we overcome this by statistical verification using a unified density metric. We further propose a multi-level watermarking scheme adapting to feature-level, soft-label, or hard-label outputs in CL. Experiments show some backdoor attacks can be repurposed as effective watermarks with trade-offs among fidelity, verifiability, and robustness. This work demonstrates weak backdoor effects become reliable signals for dataset IP protection in challenging CL settings.
Problem

Research questions and friction points this paper is trying to address.

contrastive learning
dataset poisoning
watermarking
IP protection
backdoor attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataset watermarking
contrastive learning
data poisoning
statistical verification
IP protection
🔎 Similar Papers
Z
Zhiyang Dai
School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Y
Yansong Gao
University of Western Australia, Perth, Australia
B
Boyu Kuang
School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Haodong Li
Haodong Li
UC San Diego. Prev: HKUST, ZJU, Tencent.
3DVGenerative ModelsAgents
Q
Qi Chang
College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
Gaurav Varshney
Gaurav Varshney
Assistant Professor, Department of CSE, IIT Jammu
Anti PhishingDNS/TLS/HTTP and Network SecurityInformation SecurityThreat ModelingDigital Payments Security
Derek Abbott
Derek Abbott
Professor of Electrical & Electronic Engineering, University of Adelaide, Australia
terahertzphotonicscomplex systemsstochasticsquantum mechanics
A
Anmin Fu
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China