Can Targeted Clean-Label Poisoning Attacks Generalize?

📅 2024-12-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the challenge of enhancing the generalizability of clean-label targeted data poisoning (TDP) attacks under realistic physical variations—including viewpoint, illumination, background, and target appearance changes. Recognizing the absence of systematic evaluation and modeling frameworks for TDP generalizability across such variations, we propose the first generalization-aware evaluation framework specifically designed for target variants. We further introduce a generalizable poisoning method that jointly optimizes gradient direction and magnitude via gradient-aware optimization and intra-target-variant modeling, achieving robust cross-variant attack performance without compromising the model’s clean accuracy. Extensive experiments on two image benchmark datasets and four model architectures demonstrate that our approach improves average attack success rate by 20.95% over state-of-the-art cosine-similarity-based baselines.

Technology Category

Application Category

📝 Abstract

Targeted poisoning attacks aim to compromise the model's prediction on specific target samples. In a common clean-label setting, they are achieved by slightly perturbing a subset of training samples given access to those specific targets. Despite continuous efforts, it remains unexplored whether such attacks can generalize to unknown variations of those targets. In this paper, we take the first step to systematically study this generalization problem. Observing that the widely adopted, cosine similarity-based attack exhibits limited generalizability, we propose a well-generalizable attack that leverages both the direction and magnitude of model gradients. In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances. Extensive experiments across various generalization scenarios demonstrate that our method consistently achieves the best attack effectiveness. For example, our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy, averaged over four models on two image benchmark datasets. The code is available at https://github.com/jiaangk/generalizable_tcpa

Problem

Research questions and friction points this paper is trying to address.

Study generalizability of targeted data poisoning across physical variations

Optimize gradient direction and magnitude for better poisoning success

Address real-world threats in varying physical conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes gradient direction and magnitude

Enhances generalizable gradient matching

Improves poisoning success rates significantly

🔎 Similar Papers

Model-agnostic clean-label backdoor mitigation in cybersecurity environments