🤖 AI Summary
Deep neural networks are vulnerable to backdoor attacks, yet existing detection methods rely on clean or surrogate data, gradient information, or iterative optimization, resulting in high computational overhead and limited practicality. This work proposes HTell, a lightweight, data-free backdoor detection method that, for the first time, identifies backdoors by injecting architecture-aware random latent probes into the model’s prediction head and analyzing the concentration statistics of its class-wise responses. HTell requires no real or surrogate data, gradients, or parameter updates, substantially improving efficiency and applicability. Evaluated on a large-scale benchmark comprising over 6,000 backdoored and 700 clean models, HTell achieves a true positive rate of 99.03% and a false positive rate of 2.11%, with a per-model detection time of only 12.69 milliseconds—over 30,000× faster than typical gradient-based approaches.
📝 Abstract
Deep neural networks (DNNs) remain critically vulnerable to backdoor attacks. Existing post-training detectors often require clean or surrogate data, gradients, or iterative trigger reconstruction, leading to high computational costs and limited robustness under practical model-auditing scenarios. In this paper, we propose HTell, a fast and lightweight data-free backdoor detector based on head random probing. Instead of reconstructing diverse trigger patterns, HTell inspects their unified manifestation in the prediction head: backdoored models tend to exhibit abnormal response concentration on the target class under random latent probes. HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics, without accessing real or surrogate data, model gradients, or parameter optimization. We evaluate HTell on a large-scale benchmark containing more than 6,000 backdoored models and over 700 clean models, covering 4 datasets, 14 architectures, and 21 types of backdoor attacks. HTell achieves 99.03% true positive rate and 2.11% false positive rate with only 12.69 ms/model detection latency, reducing the time cost by over 30,000$\times$ compared with representative gradient-based detectors. These results demonstrate that head random probing provides an accurate, robust, and efficient solution for large-scale data-free backdoor model auditing.