A Research and Development Portfolio of GNN Centric Malware Detection, Explainability, and Dataset Curation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph neural network (GNN)-based malware detection approaches face critical bottlenecks in scalability, interpretability, and the scarcity of high-quality, labeled control-flow graphs (CFGs). To address these challenges, this paper proposes an efficient and interpretable detection framework. First, we construct the first open-source, large-scale, and comprehensively labeled CFG dataset. Second, we design an attention-guided stacked GNN architecture that integrates graph reduction and subgraph matching to enhance inference efficiency. Third, we introduce a dual-path explanation mechanism—combining gradient-based and prototype-based methods—alongside a novel consistency metric to significantly improve decision transparency. Experimental results demonstrate that our method achieves state-of-the-art detection accuracy while accelerating inference by 42% and attaining an explanation consistency score of 0.89. This work establishes a new paradigm for reproducible and verifiable explainable malware analysis.

Technology Category

Application Category

📝 Abstract
Graph Neural Networks (GNNs) have become an effective tool for malware detection by capturing program execution through graph-structured representations. However, important challenges remain regarding scalability, interpretability, and the availability of reliable datasets. This paper brings together six related studies that collectively address these issues. The portfolio begins with a survey of graph-based malware detection and explainability, then advances to new graph reduction methods, integrated reduction-learning approaches, and investigations into the consistency of explanations. It also introduces dual explanation techniques based on subgraph matching and develops ensemble-based models with attention-guided stacked GNNs to improve interpretability. In parallel, curated datasets of control flow graphs are released to support reproducibility and enable future research. Together, these contributions form a coherent line of research that strengthens GNN-based malware detection by enhancing efficiency, increasing transparency, and providing solid experimental foundations.
Problem

Research questions and friction points this paper is trying to address.

Addressing scalability challenges in GNN-based malware detection systems
Improving interpretability of malware detection through explainable AI techniques
Developing curated datasets to support reproducible malware detection research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph reduction methods for scalable malware detection
Ensemble models with attention-guided stacked GNNs
Dual explanation techniques using subgraph matching