Interpreting GNN-based IDS Detections Using Provenance Graph Structural Features

📅 2023-06-01

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

191K/year

🤖 AI Summary

To address the insufficient interpretability, lack of transparency, and poor verifiability in GNN-driven provenance graph-based intrusion detection systems, this paper proposes PROVEXPLAINER—a novel explainable surrogate modeling framework that uniquely integrates discriminative subgraph patterns with domain-adapted graph structural features, generating security-semantic–clear, instance-level, and provenance-aware explanations. Leveraging mainstream GNNs (e.g., GAT, GraphSAGE) as surrogates, PROVEXPLAINER introduces a dual-axis evaluation metric combining fidelity+ / fidelity− and precision / recall, while deeply incorporating system-level provenance knowledge space. Evaluated on malware and APT datasets, it outperforms state-of-the-art methods: fidelity+ improves by 29%, precision by 27%, and recall by 25%, while fidelity− decreases by 12%. These results substantially bridge the semantic gap between GNN explainers and real-world provenance security analysis.

📝 Abstract

Advanced cyber threats (e.g., Fileless Malware and Advanced Persistent Threat (APT)) have driven the adoption of provenance-based security solutions. These solutions employ Machine Learning (ML) models for behavioral modeling and critical security tasks such as malware and anomaly detection. However, the opacity of ML-based security models limits their broader adoption, as the lack of transparency in their decision-making processes restricts explainability and verifiability. We tailored our solution towards Graph Neural Network (GNN)-based security solutions since recent studies employ GNNs to comprehensively digest system provenance graphs for security critical tasks. To enhance the explainability of GNN-based security models, we introduce PROVEXPLAINER, a framework offering instance-level security-aware explanations using an interpretable surrogate model. PROVEXPLAINER's interpretable feature space consists of discriminant subgraph patterns and graph structural features, which can be directly mapped to the system provenance problem space, making the explanations human understandable. By considering prominent GNN architectures (e.g., GAT and GraphSAGE) for anomaly detection tasks, we show how PROVEXPLAINER synergizes with current state-of-the-art (SOTA) GNN explainers to deliver domain and instance-specific explanations. We measure the explanation quality using the fidelity+/fidelity- metric as used by traditional GNN explanation literature, and we incorporate the precision/recall metric where we consider the accuracy of the explanation against the ground truth. On malware and APT datasets, PROVEXPLAINER achieves up to 29%/27%/25% higher fidelity+, precision and recall, and 12% lower fidelity- respectively, compared to SOTA GNN explainers.

Problem

Research questions and friction points this paper is trying to address.

Enhancing explainability of GNN-based intrusion detection systems

Addressing the opacity of ML models for security tasks

Providing human-interpretable explanations for provenance graph analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses interpretable surrogate model for explanations

Leverages subgraph patterns and structural features

Combines fidelity metrics with precision/recall evaluation

🔎 Similar Papers

XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model