Can Targeted Clean-Label Poisoning Attacks Generalize?

📅 2024-12-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enhancing the generalizability of clean-label targeted data poisoning (TDP) attacks under realistic physical variations—including viewpoint, illumination, background, and target appearance changes. Recognizing the absence of systematic evaluation and modeling frameworks for TDP generalizability across such variations, we propose the first generalization-aware evaluation framework specifically designed for target variants. We further introduce a generalizable poisoning method that jointly optimizes gradient direction and magnitude via gradient-aware optimization and intra-target-variant modeling, achieving robust cross-variant attack performance without compromising the model’s clean accuracy. Extensive experiments on two image benchmark datasets and four model architectures demonstrate that our approach improves average attack success rate by 20.95% over state-of-the-art cosine-similarity-based baselines.

Technology Category

Application Category

📝 Abstract
Targeted poisoning attacks aim to compromise the model's prediction on specific target samples. In a common clean-label setting, they are achieved by slightly perturbing a subset of training samples given access to those specific targets. Despite continuous efforts, it remains unexplored whether such attacks can generalize to unknown variations of those targets. In this paper, we take the first step to systematically study this generalization problem. Observing that the widely adopted, cosine similarity-based attack exhibits limited generalizability, we propose a well-generalizable attack that leverages both the direction and magnitude of model gradients. In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances. Extensive experiments across various generalization scenarios demonstrate that our method consistently achieves the best attack effectiveness. For example, our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy, averaged over four models on two image benchmark datasets. The code is available at https://github.com/jiaangk/generalizable_tcpa
Problem

Research questions and friction points this paper is trying to address.

Study generalizability of targeted data poisoning across physical variations
Optimize gradient direction and magnitude for better poisoning success
Address real-world threats in varying physical conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes gradient direction and magnitude
Enhances generalizable gradient matching
Improves poisoning success rates significantly
🔎 Similar Papers
No similar papers found.
Zhizhen Chen
Zhizhen Chen
University of Virginia
Trustworthy ML
S
Subrat Kishore Dutta
Saarland University, CISPA Helmholtz Center for Information Security
Zhengyu Zhao
Zhengyu Zhao
Xi'an Jiaotong University, China
Adversarial Machine LearningComputer Vision
C
Chenhao Lin
Xi’an Jiaotong University
C
Chao Shen
Xi’an Jiaotong University
X
Xiao Zhang
CISPA Helmholtz Center for Information Security