Density-aware Sample-specific Attack

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the vulnerability of existing backdoor attacks to post-training defenses such as fine-tuning or pruning, which often drastically reduce attack success rates. Drawing on Bayesian optimal model analysis, the study pioneers a strategy that links trigger sample construction to data distribution density, deliberately steering triggers toward low-density regions of the clean data manifold to simultaneously enhance both attack success rate and clean accuracy. To achieve this, the authors introduce a bilevel optimization framework that leverages conditional time score matching to estimate density ratios and integrates a hybrid objective for precise, sample-specific trigger placement. Experiments demonstrate that the proposed method achieves over 99% attack success rate across multiple benchmarks, maintains a post-defense ASR 50–85 percentage points higher than baselines under fine-tuning, and remains entirely robust against neuron pruning.

📝 Abstract

Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through fine-tuning or pruning. We revisit the core objectives of backdoor attacks and derive principled criteria characterizing optimal sample-specific trigger construction under a Bayes-optimal model of the victim's training. Our analysis reveals that both attack success and clean-accuracy preservation are simultaneously optimized when triggered samples are steered into low-density regions of the clean data distribution, a distributional condition that controls all moments of the poisoned distribution at once rather than a handful of input-space summary statistics. We introduce a bilevel optimization framework that estimates density ratios via conditional time-score matching and optimizes a mixture-model objective to place triggered samples in these sparse regions. Extensive evaluations on MNIST, CIFAR-10, GTSRB, and TinyImageNet demonstrate that our method achieves above 99\% attack success rate before defense and retains 50--85 percentage points higher post-defense ASR than the strongest baselines under fine-tuning defenses. Against neuron-pruning defenses, the method exhibits complete immunity, with zero neurons identified for removal across all pruning thresholds. These results expose a fundamental gap in current defense paradigms and underscore the need for defenses that operate beyond the support of the clean distribution.

Problem

Research questions and friction points this paper is trying to address.

backdoor attacks

post-training defenses

attack success rate

clean-accuracy preservation

density-aware

Innovation

Methods, ideas, or system contributions that make the work stand out.

density-aware backdoor

sample-specific trigger

bilevel optimization