🤖 AI Summary
Backdoor attacks pose a severe threat to deep learning model security, yet existing defense methods suffer from heterogeneous evaluation protocols, hindering fair comparison. This work presents the first large-scale meta-analysis of 183 backdoor defense papers published between 2018 and 2025, covering 3 benchmarks (MNIST, CIFAR-100, ImageNet-1K), 4 architectures (ResNet-18, VGG-19, ViT-B/16, DenseNet-121), 16 defense methods, and 5 attack strategies. Through over 3,000 empirical experiments, we systematically identify critical evaluation flaws—including unreported computational overhead, neglect of benign accuracy, and hyperparameter selection bias. We propose a standardized evaluation framework specifying experimental configurations, metric definitions, and threat modeling conventions. Our results demonstrate that minor variations in evaluation settings induce substantial fluctuations in reported defense efficacy. The framework establishes a reproducible, comparable, and deployable benchmark, advancing backdoor defense research toward rigor and practicality for both academia and industry.
📝 Abstract
Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs. While numerous defenses have been proposed to mitigate these attacks, the heterogeneous landscape of evaluation methodologies hinders fair comparison between defenses. This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation. We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues, examining the properties and evaluation methodologies of these defenses.
Our analysis reveals significant inconsistencies in experimental setups, evaluation metrics, and threat model assumptions in the literature. Through extensive experiments involving three datasets (MNIST, CIFAR-100, ImageNet-1K), four model architectures (ResNet-18, VGG-19, ViT-B/16, DenseNet-121), 16 representative defenses, and five commonly used attacks, totaling over 3,000 experiments, we demonstrate that defense effectiveness varies substantially across different evaluation setups. We identify critical gaps in current evaluation practices, including insufficient reporting of computational overhead and behavior under benign conditions, bias in hyperparameter selection, and incomplete experimentation. Based on our findings, we provide concrete challenges and well-motivated recommendations to standardize and improve future defense evaluations. Our work aims to equip researchers and industry practitioners with actionable insights for developing, assessing, and deploying defenses to different systems.