Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically investigates the adverse impact of concept drift on Android malware detection models. To address feature distribution shifts caused by rapid malware evolution, we comparatively evaluate the robustness of nine machine learning, deep learning, and large language model (LLM) approaches under few-shot settings across five feature modalities: static, dynamic, hybrid, semantic, and image-based. Results demonstrate that concept drift is pervasive and significantly degrades detection accuracy—on average by 12.7%. Conventional class-balancing techniques mitigate label imbalance but fail to counteract distributional shift. While LLMs exhibit superior drift resilience due to strong generalization capability, their performance remains constrained by prompt quality and domain-specific adaptation. This study is the first to quantify the drift impact mechanism within a multimodal feature–LLM collaborative framework, thereby establishing both theoretical foundations and practical guidelines for developing sustainably adaptive Android security detection systems.

Technology Category

Application Category

📝 Abstract
Despite outstanding results, machine learning-based Android malware detection models struggle with concept drift, where rapidly evolving malware characteristics degrade model effectiveness. This study examines the impact of concept drift on Android malware detection, evaluating two datasets and nine machine learning and deep learning algorithms, as well as Large Language Models (LLMs). Various feature types--static, dynamic, hybrid, semantic, and image-based--were considered. The results showed that concept drift is widespread and significantly affects model performance. Factors influencing the drift include feature types, data environments, and detection methods. Balancing algorithms helped with class imbalance but did not fully address concept drift, which primarily stems from the dynamic nature of the malware landscape. No strong link was found between the type of algorithm used and concept drift, the impact was relatively minor compared to other variables since hyperparameters were not fine-tuned, and the default algorithm configurations were used. While LLMs using few-shot learning demonstrated promising detection performance, they did not fully mitigate concept drift, highlighting the need for further investigation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating concept drift impact on Android malware detection models
Assessing various feature types and algorithms for drift resilience
Exploring LLMs' potential and limitations in mitigating concept drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated multiple ML and deep learning algorithms
Assessed various feature types for malware detection
Explored LLMs with few-shot learning capabilities