Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning

πŸ“… 2025-09-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work investigates instance-level vulnerability disparities among test samples under targeted data poisoning attacks in classification models, addressing the oversight of sample heterogeneity in prior research. We propose three quantifiable and interpretable metricsβ€”*traversal prediction accuracy*, *poisoning distance*, and *poisoning budget*β€”to characterize per-sample attack difficulty from complementary perspectives: model training dynamics, geometric proximity in feature space, and resource constraints, respectively. Extensive experiments across diverse poisoning scenarios demonstrate that these metrics consistently predict real-world attack success rates with significantly higher accuracy than baseline methods. To our knowledge, this is the first systematic study to uncover the intrinsic mechanisms governing test-sample vulnerability. The proposed framework provides both theoretical foundations and practical tools for fine-grained security assessment, explainable attack analysis, and adaptive defense design against data poisoning.

Technology Category

Application Category

πŸ“ Abstract
Targeted data poisoning attacks pose an increasingly serious threat due to their ease of deployment and high success rates. These attacks aim to manipulate the prediction for a single test sample in classification models. Unlike indiscriminate attacks that aim to decrease overall test performance, targeted attacks present a unique threat to individual test instances. This threat model raises a fundamental question: what factors make certain test samples more susceptible to successful poisoning than others? We investigate how attack difficulty varies across different test instances and identify key characteristics that influence vulnerability. This paper introduces three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy (analyzed through clean training dynamics), poison distance, and poison budget. Our experimental results demonstrate that these metrics effectively predict the varying difficulty of real-world targeted poisoning attacks across diverse scenarios, offering practitioners valuable insights for vulnerability assessment and understanding data poisoning attacks.
Problem

Research questions and friction points this paper is trying to address.

Identifying factors affecting test sample poisoning susceptibility
Quantifying instance-level difficulty in targeted data poisoning attacks
Developing predictive criteria for data poisoning vulnerability assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive criteria for poisoning difficulty
Analyze clean training dynamics accuracy
Measure poison distance and budget
πŸ”Ž Similar Papers
No similar papers found.