🤖 AI Summary
This work addresses the threat of backdoor attacks in neural networks, which exhibit normal behavior on benign inputs but execute attacker-specified actions when triggered by a specific pattern, thereby remaining highly stealthy. To counter this, the paper introduces active path analysis—a novel approach to backdoor detection and removal—that offers both interpretability and practical utility. By identifying anomalous activation paths uniquely triggered by the backdoor pattern, the method effectively localizes and eliminates the embedded backdoor. Experimental evaluation on compromised intrusion detection models demonstrates that the proposed technique accurately identifies and successfully neutralizes backdoor triggers, confirming its effectiveness and robustness.
📝 Abstract
Machine learning backdoors have the property that the machine learning model should work as expected on normal inputs, but when the input contains a specific $\textit{trigger}$, it behaves as the attacker desires. Detecting such triggers has been proven to be extremely difficult. In this paper, we present a novel and explainable approach to detect and eliminate such backdoor triggers based on active paths found in neural networks. We present promising experimental evidence of our approach, which involves injecting backdoors into a machine learning model used for intrusion detection.