Breaking Out from the TESSERACT: Reassessing ML-based Malware Detection under Spatio-Temporal Drift

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current machine learning–based Android malware detection methods exhibit substantial performance discrepancies across two representative datasets under identical temporal scopes, unified sampling protocols, and standardized evaluation criteria—revealing critical reliability deficits in existing evaluation paradigms for real-world deployment. Method: We conduct a systematic analysis of spatiotemporal drift, identifying five novel categories of spatiotemporal bias. Leveraging two widely adopted benchmark datasets, we perform reproducible cross-spatiotemporal evaluations of classifiers proposed at five top-tier security and ML conferences. Results: Empirical findings demonstrate that classifier performance is far more sensitive to data collection time and geographic origin than previously assumed. To address this, we propose an actionable evaluation enhancement framework—covering dataset partitioning strategies, temporal constraints, and spatial representativeness—which significantly improves evaluation fidelity and consistency. This work provides a methodological foundation for building robust, trustworthy Android malware detection research ecosystems.

Technology Category

Application Category

📝 Abstract
Several recent works focused on the best practices for applying machine learning to cybersecurity. In the context of malware, TESSERACT highlighted the impact of concept drift on detection performance and suggested temporal and spatial constraints to be enforced to ensure realistic time-aware evaluations, which have been adopted by the community. In this paper, we demonstrate striking discrepancies in the performance of learning-based malware detection across the same time frame when evaluated on two representative Android malware datasets used in top-tier security conferences, both adhering to established sampling and evaluation guidelines. This questions our ability to understand how current state-of-the-art approaches would perform in realistic scenarios. To address this, we identify five novel temporal and spatial bias factors that affect realistic evaluations. We thoroughly evaluate the impact of these factors in the Android malware domain on two representative datasets and five Android malware classifiers used or proposed in top-tier security conferences. For each factor, we provide practical and actionable recommendations that the community should integrate in their methodology for more realistic and reproducible settings.
Problem

Research questions and friction points this paper is trying to address.

Evaluating ML-based malware detection under spatio-temporal drift
Identifying bias factors in Android malware datasets
Improving reproducibility in malware detection evaluations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies five novel bias factors
Evaluates impact on Android malware
Provides actionable evaluation recommendations
🔎 Similar Papers
No similar papers found.