π€ AI Summary
This work addresses the limitations of existing backdoor detection methods, which often rely on clean data or prior knowledge of triggers and struggle against sophisticated attacks with millisecond-level implantation. The authors propose DFBScanner, a novel approach that shifts the detection focus from identifying diverse triggers to uncovering a unified backdoor representation embedded in the modelβs final-layer parameters. By statically analyzing anomalous patterns in classification weight updates, DFBScanner enables lightweight, rapid, and attack-agnostic detection without requiring any prior assumptions. Integrating multidimensional anomaly indicators with a maximum anomaly scoring mechanism, the method achieves a true positive rate of 97.17% and a false positive rate of only 0.95% on a large-scale benchmark comprising over 5,000 models, with an average detection time of just one millisecond per model.
π Abstract
Deep neural networks (DNN), despite their remarkable performance, are highly vulnerable to backdoor attacks. Existing defenses mainly rely on activation anomaly analysis or trigger reverse engineering and often require clean samples or prior knowledge of trigger patterns, resulting in limited efficacy, practicability, and generalizability. More critically, while advanced attacks can implement backdoor implantation in milliseconds, current detection approaches typically demand minutes or even hours. To this end, we propose DFBScanner, a lightweight static parameter inspection framework for fast backdoor scanning. DFBScanner leverages our key observation that backdoor-induced feature perturbations can lead to distinctive and anomalous parameter updates in the final classification layer. Hence, we shift our detection focus from recognizing diverse and attack-specific trigger patterns targeted by prior work, to identifying the unified backdoor manifestation within the final layer, thereby enabling efficient and attack-agnostic detection. Specifically, by constructing and strategically combining multiple anomaly indicators of the final-layer parameters into a Trojan clue, DFBScanner detects backdoors through maximum anomaly scoring. DFBScanner is evaluated on a large-scale backdoor benchmark, including over 5,000 backdoor models trained on 4 datasets, 12 network architectures, 20 types of backdoor triggers, 2 attack strategies (all-to-one and -all), and 3 backdoor injection methods (data poisoning, training pipeline manipulation, and bit-flips). Numerical results show that DFBScanner achieves a 97.17% true-positive rate, 0.95% false-positive rate, and an average detection time of only 1 ms per model, significantly outperforming prior methods.