Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the performance degradation of Android malware detectors caused by concept drift after deployment, where frequent full retraining incurs prohibitive costs. The authors propose a temporal adaptive maintenance framework that formulates detector upkeep as a sequential decision-making problem. By leveraging a frozen self-supervised encoder to extract stable representations, the approach enables efficient updates through lightweight trainable adapters and classification heads. A reinforcement learning controller based on the Proximal Policy Optimization (PPO) algorithm dynamically selects low-cost maintenance actions. This study presents the first integration of self-supervised learning with reinforcement learning to tackle concept drift, yielding a cost-aware dynamic maintenance strategy. Experiments on both simulated and real-world Android malware datasets demonstrate that the method consistently achieves a near-optimal trade-off among temporal performance, memory retention, and maintenance overhead.

📝 Abstract

Android malware detectors often degrade after deployment because of concept drift, while full retraining at each maintenance step is costly. We propose a chronological adaptive maintenance framework that models deployment-time maintenance as a sequential decision problem. The framework learns a stable latent representation through self-supervised learning during initialization, freezes the encoder, measures latent drift in the fixed representation space, and performs lightweight downstream adaptation using a trainable adapter and classification head. A proximal policy optimization controller selects low-cost maintenance actions based on the detector state, including current utility, retention on a fixed memory set, latent drift indicators, and update cost. We evaluate the framework under a causal deployment-style protocol on emulator and real Android malware datasets with static and dynamic features. Results show that the RL controller provides a strong cost-aware adaptation strategy, consistently remaining among the top-performing policies while achieving a favorable balance between temporal performance, memory retention, and maintenance cost under non-stationary deployment conditions.

Problem

Research questions and friction points this paper is trying to address.

concept drift

Android malware detection

model maintenance

non-stationary environments

cost-aware adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept Drift Adaptation

Self-Supervised Learning

Reinforcement Learning