🤖 AI Summary
Existing Android malware research is hampered by severe limitations in mainstream datasets (e.g., Drebin): labels rely heavily on noisy VirusTotal multi-engine aggregation; samples are outdated and lack temporal relevance; and automated classification tools (e.g., AVClass2) employ coarse-grained strategies that exacerbate mislabeling and misguide downstream studies. Method: We propose the first high-quality, industrially oriented benchmark dataset construction framework for mobile security, integrating multi-source threat intelligence, dynamic analysis, and automated pre-screening, augmented by an expert-driven human verification feedback loop to ensure fine-grained, high-fidelity family labeling and continuous real-time updates. Contribution/Results: Our dataset achieves significantly lower label error rates and extends temporal coverage to the latest threats. It provides an authoritative, trustworthy evaluation benchmark for assessing detection model robustness, cross-temporal generalization, and industrial-grade defense validation.
📝 Abstract
The rapidly evolving Android malware ecosystem demands high-quality, real-time datasets as a foundation for effective detection and defense. With the widespread adoption of mobile devices across industrial systems, they have become a critical yet often overlooked attack surface in industrial cybersecurity. However, mainstream datasets widely used in academia and industry (e.g., Drebin) exhibit significant limitations: on one hand, their heavy reliance on VirusTotal's multi-engine aggregation results introduces substantial label noise; on the other hand, outdated samples reduce their temporal relevance. Moreover, automated labeling tools (e.g., AVClass2) suffer from suboptimal aggregation strategies, further compounding labeling errors and propagating inaccuracies throughout the research community.