🤖 AI Summary
Existing graph neural network (GNN)-based malware detection approaches face critical bottlenecks in scalability, interpretability, and the scarcity of high-quality, labeled control-flow graphs (CFGs). To address these challenges, this paper proposes an efficient and interpretable detection framework. First, we construct the first open-source, large-scale, and comprehensively labeled CFG dataset. Second, we design an attention-guided stacked GNN architecture that integrates graph reduction and subgraph matching to enhance inference efficiency. Third, we introduce a dual-path explanation mechanism—combining gradient-based and prototype-based methods—alongside a novel consistency metric to significantly improve decision transparency. Experimental results demonstrate that our method achieves state-of-the-art detection accuracy while accelerating inference by 42% and attaining an explanation consistency score of 0.89. This work establishes a new paradigm for reproducible and verifiable explainable malware analysis.
📝 Abstract
Graph Neural Networks (GNNs) have become an effective tool for malware detection by capturing program execution through graph-structured representations. However, important challenges remain regarding scalability, interpretability, and the availability of reliable datasets. This paper brings together six related studies that collectively address these issues. The portfolio begins with a survey of graph-based malware detection and explainability, then advances to new graph reduction methods, integrated reduction-learning approaches, and investigations into the consistency of explanations. It also introduces dual explanation techniques based on subgraph matching and develops ensemble-based models with attention-guided stacked GNNs to improve interpretability. In parallel, curated datasets of control flow graphs are released to support reproducibility and enable future research. Together, these contributions form a coherent line of research that strengthens GNN-based malware detection by enhancing efficiency, increasing transparency, and providing solid experimental foundations.