π€ AI Summary
Graph Neural Networks (GNNs) are vulnerable to backdoor attacks, and existing defenses often fail against adaptive adversaries due to their reliance on superficial features. This work proposes PRAETORIAN, the first defense framework grounded in the intrinsic mechanisms of backdoor attacks. By jointly analyzing the internal connectivity of trigger subgraphs and the external influence of their constituent nodes, PRAETORIAN identifies anomalous injected structures and high-impact trigger nodes. This approach forces attackers into a trade-off between attack success rate and the modelβs clean accuracy. Empirical results demonstrate that PRAETORIAN reduces the average attack success rate to 0.55% with only a 0.62% drop in clean accuracy, substantially outperforming current defenses and exhibiting strong robustness against diverse adaptive attacks.
π Abstract
GNNs have become a standard tool for learning on relational data, yet they remain highly vulnerable to backdoor attacks. Prior defenses often depend on inspecting specific subgraph patterns or node features, and thus can be circumvented by adaptive attackers. We propose PRAETORIAN, a new defense that targets intrinsic requirements of effective GNN backdoors rather than surface-level cues. Our key observation is that flipping a victim node's prediction requires substantial influence on the victim: attackers tend to either inject many trigger nodes or rely on a small set of highly influential ones. Building on this observation, PRAETORIAN (i) analyzes internal correlations within potential trigger subgraphs to detect abnormally large injected structures, and (ii) quantifies external node influence to identify triggers with disproportionate impact. Across our evaluations, PRAETORIAN reduces the average attack success rate (ASR) to 0.55% with only a 0.62% drop in clean accuracy (CA), whereas state-of-the-art defenses still yield an average ASR of >20% and a CA drop of >3% under the same conditions. Moreover, PRAETORIAN remains effective against a range of adaptive attacks, forcing adversaries to either inject many trigger nodes to achieve high ASR (>80%), which incurs a >10% CA drop, or preserve CA at the cost of limiting ASR to 18.1%. Overall, PRAETORIAN constrains attackers to an unfavorable trade-off between efficacy and detectability.