Fast and Lightweight Backdoor Detection via Head Random Probing

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Deep neural networks are vulnerable to backdoor attacks, yet existing detection methods rely on clean or surrogate data, gradient information, or iterative optimization, resulting in high computational overhead and limited practicality. This work proposes HTell, a lightweight, data-free backdoor detection method that, for the first time, identifies backdoors by injecting architecture-aware random latent probes into the model’s prediction head and analyzing the concentration statistics of its class-wise responses. HTell requires no real or surrogate data, gradients, or parameter updates, substantially improving efficiency and applicability. Evaluated on a large-scale benchmark comprising over 6,000 backdoored and 700 clean models, HTell achieves a true positive rate of 99.03% and a false positive rate of 2.11%, with a per-model detection time of only 12.69 milliseconds—over 30,000× faster than typical gradient-based approaches.

📝 Abstract

Deep neural networks (DNNs) remain critically vulnerable to backdoor attacks. Existing post-training detectors often require clean or surrogate data, gradients, or iterative trigger reconstruction, leading to high computational costs and limited robustness under practical model-auditing scenarios. In this paper, we propose HTell, a fast and lightweight data-free backdoor detector based on head random probing. Instead of reconstructing diverse trigger patterns, HTell inspects their unified manifestation in the prediction head: backdoored models tend to exhibit abnormal response concentration on the target class under random latent probes. HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics, without accessing real or surrogate data, model gradients, or parameter optimization. We evaluate HTell on a large-scale benchmark containing more than 6,000 backdoored models and over 700 clean models, covering 4 datasets, 14 architectures, and 21 types of backdoor attacks. HTell achieves 99.03% true positive rate and 2.11% false positive rate with only 12.69 ms/model detection latency, reducing the time cost by over 30,000$\times$ compared with representative gradient-based detectors. These results demonstrate that head random probing provides an accurate, robust, and efficient solution for large-scale data-free backdoor model auditing.

Problem

Research questions and friction points this paper is trying to address.

backdoor detection

deep neural networks

model auditing

data-free

post-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor detection

data-free

head random probing