DREAM: Combating Concept Drift with Explanatory Detection and Adaptation in Malware Classification

📅 2024-05-07
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Android malware classifiers suffer severe performance degradation under concept drift—e.g., rapid emergence of novel malware families—while existing expert-annotated drift detection and retraining approaches exhibit poor generalizability and high annotation overhead. This paper proposes the first explainable concept drift adaptation framework for Android malware classification. It jointly updates a semi-supervised drift detector and classifier by leveraging latent-space concept embeddings and sensitivity analysis to localize semantic shifts in malicious behavior with interpretability. Furthermore, it introduces autonomous data generation and explainability-guided feedback to steer precise, minimal human intervention. Extensive experiments across multiple real-world datasets and classifier architectures demonstrate that our method significantly improves drift detection accuracy, reduces expert annotation effort by over 50%, and achieves strong generalizability and deployment stability.

Technology Category

Application Category

📝 Abstract
Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive understanding of malware concepts and provide limited guidance for effective drift adaptation, leading to unstable detection performance and high human labeling costs. To address these limitations, we introduce DREAM, a novel system designed to surpass the capabilities of existing drift detectors and to establish an explanatory drift adaptation process. DREAM enhances drift detection through model sensitivity and data autonomy. The detector, trained in a semi-supervised approach, proactively captures malware behavior concepts through classifier feedback. During testing, it utilizes samples generated by the detector itself, eliminating reliance on extensive training data. For drift adaptation, DREAM enlarges human intervention, enabling revisions of malware labels and concept explanations embedded within the detector's latent space. To ensure a comprehensive response to concept drift, it facilitates a coordinated update process for both the classifier and the detector. Our evaluation shows that DREAM can effectively improve the drift detection accuracy and reduce the expert analysis effort in adaptation across different malware datasets and classifiers.
Problem

Research questions and friction points this paper is trying to address.

Detecting and adapting to concept drift in Android malware classification
Reducing human labeling costs and improving drift detection accuracy
Integrating classifier and expert knowledge for effective model retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates classifier and expert knowledge in model
Uses contrastive autoencoder for malware explanations
Reduces labeled samples needed for retraining