Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the temporal bias in Android malware detection caused by ignoring app release timestamps. To mitigate this issue, the authors propose a time-aware and obfuscation-resilient detection framework. They construct a large-scale dataset annotated with precise timestamps and incorporate a temporal validation mechanism to ensure chronological consistency during evaluation. Notably, they pioneer the integration of BYOL-based self-supervised pretraining with supervised classification to learn robust, temporally coherent feature representations. Experimental results demonstrate that the proposed approach achieves 98% accuracy and an 89% F1 score under a time-aware evaluation protocol. Furthermore, the study provides in-depth analysis of malicious behaviors through cross-referencing with VirusTotal and MITRE ATT&CK. The dataset and source code have been publicly released to foster reproducibility and future research.

Technology Category

Application Category

📝 Abstract

Android malware detectors built with machine learning often suffer from temporal bias: models are trained and evaluated without respecting apps' actual release times, inflating accuracy and weakening real-world robustness. We address this by constructing a time-stamped dataset of benign and malicious Android apps and introducing a timestamp-verification procedure to ensure temporal accuracy. We then propose a detection framework that uses Bootstrap Your Own Latent (BYOL) for self-supervised pre-training to learn obfuscation-resilient representations, followed by supervised classification. Under time-aware evaluation, the method attains 98% accuracy and 89% F1. We further characterize malware behavior by analyzing true positives and false negatives using VirusTotal and the MITRE ATT&CK framework. To support reproducibility and further innovation, we release our dataset and source code.

Problem

Research questions and friction points this paper is trying to address.

temporal bias

Android malware detection

time-stamped dataset

machine learning

real-world robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Supervised Learning

Temporal Bias

BYOL