MARD: A Multi-Agent Framework for Robust Android Malware Detection

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses key challenges in Android malware detection—namely concept drift, shallow feature representation, and insufficient interpretability—by proposing the first multi-agent collaborative detection framework. Integrating the semantic reasoning capabilities of large language models with a static analysis engine, the approach leverages the ReAct paradigm to enable autonomous agent interaction and evidence chain generation. Notably, it operates without domain-specific fine-tuning and achieves an F1 score of 93.46% over a five-year evaluation period, substantially outperforming continual learning baselines. With an analysis cost of less than \$0.10 per APK, the method demonstrates high robustness, strong cross-domain generalization, and transparent, interpretable decision-making.
📝 Abstract
With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable semantic reasoning capabilities, directly processing massive raw code incurs prohibitive token overhead. Moreover, this approach fails to fully unleash the deep logical reasoning potential of LLMs within complex contexts. To address these limitations, we propose MARD, a multi-agent framework for robust Android malware detection. This framework effectively bridges the gap between the semantic understanding of LLMs and traditional static analysis. It treats underlying deterministic analysis engines as on-demand execution tools, while utilizing the LLM to orchestrate the entire decision-making process. By designing an autonomous multi-agent interaction mechanism based on the ReAct paradigm, MARD constructs a highly interpretable evidentiary chain for conviction. Furthermore, we radically reduce the total cost of conducting a deep analysis of a single complex APK to under $0.10. Evaluations demonstrate that, without any domain-specific fine-tuning, MARD achieves an F1 score of 93.46%. It not only outperforms continual learning baselines but also exhibits robustness against concept drift and strong cross-domain generalization capabilities in evaluations spanning up to five years.
Problem

Research questions and friction points this paper is trying to address.

Android malware detection
concept drift
semantic understanding
interpretability
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Framework
Large Language Models
Android Malware Detection
Concept Drift Robustness
Interpretable Reasoning
X
Xueying Zeng
School of Computer Science and Engineering, Beihang University, Beijing, China
Y
Youquan Xian
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Sihao Liu
Sihao Liu
UCLA
Computer ArchitectureVLSICPUFPGA
Xudong Mou
Xudong Mou
Beihang University
Y
Yanze Li
School of Computer Science and Engineering, Beihang University, Beijing, China
Lei Cui
Lei Cui
Deakin University, School of IT
Bo Li
Bo Li
Associate Professor of Beihang university
big data