Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses the temporal bias in Android malware detection caused by ignoring app release timestamps. To mitigate this issue, the authors propose a time-aware and obfuscation-resilient detection framework. They construct a large-scale dataset annotated with precise timestamps and incorporate a temporal validation mechanism to ensure chronological consistency during evaluation. Notably, they pioneer the integration of BYOL-based self-supervised pretraining with supervised classification to learn robust, temporally coherent feature representations. Experimental results demonstrate that the proposed approach achieves 98% accuracy and an 89% F1 score under a time-aware evaluation protocol. Furthermore, the study provides in-depth analysis of malicious behaviors through cross-referencing with VirusTotal and MITRE ATT&CK. The dataset and source code have been publicly released to foster reproducibility and future research.

Technology Category

Application Category

📝 Abstract
Android malware detectors built with machine learning often suffer from temporal bias: models are trained and evaluated without respecting apps' actual release times, inflating accuracy and weakening real-world robustness. We address this by constructing a time-stamped dataset of benign and malicious Android apps and introducing a timestamp-verification procedure to ensure temporal accuracy. We then propose a detection framework that uses Bootstrap Your Own Latent (BYOL) for self-supervised pre-training to learn obfuscation-resilient representations, followed by supervised classification. Under time-aware evaluation, the method attains 98% accuracy and 89% F1. We further characterize malware behavior by analyzing true positives and false negatives using VirusTotal and the MITRE ATT&CK framework. To support reproducibility and further innovation, we release our dataset and source code.
Problem

Research questions and friction points this paper is trying to address.

temporal bias
Android malware detection
time-stamped dataset
machine learning
real-world robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Supervised Learning
Temporal Bias
BYOL
Android Malware Detection
Time-Stamped Dataset