๐ค AI Summary
Existing backdoor attacks against deep neural networks lack theoretical foundations and rely heavily on heuristic, brute-force search strategies. Method: This paper establishes the first rigorous theoretical framework for backdoor attacks, revealing that sparse decision boundaries inherently render models highly sensitive to minute poisoned samples. We introduce the concept of โclosed-form fuzzy boundary regionsโ and integrate influence function analysis with margin-driven boundary manipulation to achieve highly robust and stealthy black-box attacks at an ultra-low poisoning rate (<0.1%). Contribution/Results: Our framework achieves >90% attack success rate while preserving clean accuracy with negligible degradation. It further demonstrates superior cross-model, cross-dataset, and cross-scenario transferability compared to state-of-the-art methods. By providing an interpretable and predictive theoretical foundation, this work advances both the modeling and defense of backdoor attacks.
๐ Abstract
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks, typically reliant on heuristic brute-force methods. Despite significant empirical advancements in backdoor research, the lack of rigorous theoretical analysis limits understanding of underlying mechanisms, constraining attack predictability and adaptability. Therefore, we provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation. Based on this finding, we derive a closed-form, ambiguous boundary region, wherein negligible relabeled samples induce substantial misclassification. Influence function analysis further quantifies significant parameter shifts caused by these margin samples, with minimal impact on clean accuracy, formally grounding why such low poison rates suffice for efficacious attacks. Leveraging these insights, we propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties. Eminence optimizes a universal, visually subtle trigger that strategically exploits vulnerable decision boundaries and effectively achieves robust misclassification with exceptionally low poison rates (< 0.1%, compared to SOTA methods typically requiring > 1%). Comprehensive experiments validate our theoretical discussions and demonstrate the effectiveness of Eminence, confirming an exponential relationship between margin poisoning and adversarial boundary manipulation. Eminence maintains > 90% attack success rate, exhibits negligible clean-accuracy loss, and demonstrates high transferability across diverse models, datasets and scenarios.