Mitigating Distribution Shift in Graph-Based Android Malware Classification via Function Metadata and LLM Embeddings

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph-based Android malware classifiers suffer a sharp 45% accuracy drop against unseen variants from known families, revealing critical limitations in shallow semantic modeling and poor generalization under distribution shift. Method: We propose a semantic-enhanced framework that (i) constructs a heterogeneous graph representation integrating function-level metadata and large language model–generated code embeddings; (ii) designs an elastic semantic enhancement mechanism supporting multi-source feature inputs; and (iii) jointly optimizes graph structure and semantic representation, compatible with diverse GNN backbones and adaptive detection strategies. Contribution/Results: We release MalNet-Tiny-Common/Distinct—a novel benchmark targeting cross-family and temporal distribution shifts. Experiments demonstrate an average 8% performance gain across multiple GNN models, significantly mitigating performance degradation under distribution shift and enhancing robustness and strong generalization to previously unseen variants.

Technology Category

Application Category

📝 Abstract
Graph-based malware classifiers can achieve over 94% accuracy on standard Android datasets, yet we find they suffer accuracy drops of up to 45% when evaluated on previously unseen malware variants from the same family - a scenario where strong generalization would typically be expected. This highlights a key limitation in existing approaches: both the model architectures and their structure-only representations often fail to capture deeper semantic patterns. In this work, we propose a robust semantic enrichment framework that enhances function call graphs with contextual features, including function-level metadata and, when available, code embeddings derived from large language models. The framework is designed to operate under real-world constraints where feature availability is inconsistent, and supports flexible integration of semantic signals. To evaluate generalization under realistic domain and temporal shifts, we introduce two new benchmarks: MalNet-Tiny-Common and MalNet-Tiny-Distinct, constructed using malware family partitioning to simulate cross-family generalization and evolving threat behavior. Experiments across multiple graph neural network backbones show that our method improves classification performance by up to 8% under distribution shift and consistently enhances robustness when integrated with adaptation-based methods. These results offer a practical path toward building resilient malware detection systems in evolving threat environments.
Problem

Research questions and friction points this paper is trying to address.

Addressing accuracy drop in graph-based malware classifiers
Enhancing semantic patterns in function call graphs
Improving generalization under distribution and temporal shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances function call graphs with metadata
Integrates LLM-derived code embeddings flexibly
Introduces new benchmarks for realistic evaluation
N
Ngoc N. Tran
Department of Computer Science, Vanderbilt University
Anwar Said
Anwar Said
AI Research Scientist at Institute for Software Integrated Systems, Vanderbilt University
Social Network AnalysisGraph Machine LearningGraph Neural NetworksGen AIData Science
W
Waseem Abbas
Department of Systems Engineering, The University of Texas at Dallas
T
Tyler Derr
Department of Computer Science, Vanderbilt University
X
Xenofon D. Koutsoukos
Department of Computer Science, Vanderbilt University