MARD: A Multi-Agent Framework for Robust Android Malware Detection

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses key challenges in Android malware detection—namely concept drift, shallow feature representation, and insufficient interpretability—by proposing the first multi-agent collaborative detection framework. Integrating the semantic reasoning capabilities of large language models with a static analysis engine, the approach leverages the ReAct paradigm to enable autonomous agent interaction and evidence chain generation. Notably, it operates without domain-specific fine-tuning and achieves an F1 score of 93.46% over a five-year evaluation period, substantially outperforming continual learning baselines. With an analysis cost of less than \$0.10 per APK, the method demonstrates high robustness, strong cross-domain generalization, and transparent, interpretable decision-making.

📝 Abstract

With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable semantic reasoning capabilities, directly processing massive raw code incurs prohibitive token overhead. Moreover, this approach fails to fully unleash the deep logical reasoning potential of LLMs within complex contexts. To address these limitations, we propose MARD, a multi-agent framework for robust Android malware detection. This framework effectively bridges the gap between the semantic understanding of LLMs and traditional static analysis. It treats underlying deterministic analysis engines as on-demand execution tools, while utilizing the LLM to orchestrate the entire decision-making process. By designing an autonomous multi-agent interaction mechanism based on the ReAct paradigm, MARD constructs a highly interpretable evidentiary chain for conviction. Furthermore, we radically reduce the total cost of conducting a deep analysis of a single complex APK to under $0.10. Evaluations demonstrate that, without any domain-specific fine-tuning, MARD achieves an F1 score of 93.46%. It not only outperforms continual learning baselines but also exhibits robustness against concept drift and strong cross-domain generalization capabilities in evaluations spanning up to five years.

Problem

Research questions and friction points this paper is trying to address.

Android malware detection

concept drift

semantic understanding

interpretability

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Framework

Large Language Models

Android Malware Detection