LAMD: Context-driven Android Malware Detection and Classification with LLMs

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Android malware detection faces challenges including rapid attack evolution, severe data bias, poor interpretability, and limitations of large language models (LLMs) in handling long contexts and understanding code structure. Method: This paper proposes a context-driven LLM analysis framework featuring (i) novel security-critical context extraction and program graph structural modeling; (ii) a three-tier hierarchical reasoning paradigm—mapping instructions to logic to semantics; and (iii) first-layer factual consistency verification to mitigate hallucination. The framework integrates static analysis, graph neural network modeling, hierarchical prompt engineering, and zero-shot inference. Contribution/Results: Experiments demonstrate that the framework significantly outperforms conventional detectors in realistic scenarios, achieving high accuracy, strong interpretability, and robustness against evolving threats. It establishes a new paradigm for LLM-augmented malware analysis under dynamic threat conditions.

Technology Category

Application Category

📝 Abstract
The rapid growth of mobile applications has escalated Android malware threats. Although there are numerous detection methods, they often struggle with evolving attacks, dataset biases, and limited explainability. Large Language Models (LLMs) offer a promising alternative with their zero-shot inference and reasoning capabilities. However, applying LLMs to Android malware detection presents two key challenges: (1)the extensive support code in Android applications, often spanning thousands of classes, exceeds LLMs' context limits and obscures malicious behavior within benign functionality; (2)the structural complexity and interdependencies of Android applications surpass LLMs' sequence-based reasoning, fragmenting code analysis and hindering malicious intent inference. To address these challenges, we propose LAMD, a practical context-driven framework to enable LLM-based Android malware detection. LAMD integrates key context extraction to isolate security-critical code regions and construct program structures, then applies tier-wise code reasoning to analyze application behavior progressively, from low-level instructions to high-level semantics, providing final prediction and explanation. A well-designed factual consistency verification mechanism is equipped to mitigate LLM hallucinations from the first tier. Evaluation in real-world settings demonstrates LAMD's effectiveness over conventional detectors, establishing a feasible basis for LLM-driven malware analysis in dynamic threat landscapes.
Problem

Research questions and friction points this paper is trying to address.

Detect Android malware using Large Language Models
Overcome context limits and code complexity challenges
Enhance explainability and reduce dataset biases in detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-driven LLM framework
Tier-wise code reasoning
Factual consistency verification
🔎 Similar Papers
No similar papers found.