ThreatIntel-Andro: Expert-Verified Benchmarking for Robust Android Malware Research

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing Android malware research is hampered by severe limitations in mainstream datasets (e.g., Drebin): labels rely heavily on noisy VirusTotal multi-engine aggregation; samples are outdated and lack temporal relevance; and automated classification tools (e.g., AVClass2) employ coarse-grained strategies that exacerbate mislabeling and misguide downstream studies. Method: We propose the first high-quality, industrially oriented benchmark dataset construction framework for mobile security, integrating multi-source threat intelligence, dynamic analysis, and automated pre-screening, augmented by an expert-driven human verification feedback loop to ensure fine-grained, high-fidelity family labeling and continuous real-time updates. Contribution/Results: Our dataset achieves significantly lower label error rates and extends temporal coverage to the latest threats. It provides an authoritative, trustworthy evaluation benchmark for assessing detection model robustness, cross-temporal generalization, and industrial-grade defense validation.

Technology Category

Application Category

📝 Abstract

The rapidly evolving Android malware ecosystem demands high-quality, real-time datasets as a foundation for effective detection and defense. With the widespread adoption of mobile devices across industrial systems, they have become a critical yet often overlooked attack surface in industrial cybersecurity. However, mainstream datasets widely used in academia and industry (e.g., Drebin) exhibit significant limitations: on one hand, their heavy reliance on VirusTotal's multi-engine aggregation results introduces substantial label noise; on the other hand, outdated samples reduce their temporal relevance. Moreover, automated labeling tools (e.g., AVClass2) suffer from suboptimal aggregation strategies, further compounding labeling errors and propagating inaccuracies throughout the research community.

Problem

Research questions and friction points this paper is trying to address.

Addresses label noise in Android malware datasets from VirusTotal reliance

Improves outdated malware samples lacking temporal relevance for detection

Enhances automated labeling accuracy to prevent error propagation in research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-verified Android malware dataset for benchmarking

Real-time samples to ensure temporal relevance

Reduces label noise from automated tools

🔎 Similar Papers

Reassessing feature-based Android malware detection in a contemporary context

2023-01-30Citations: 5

Revisiting Static Feature-Based Android Malware Detection

2024-09-11arXiv.orgCitations: 1

💼 Related Jobs

No related jobs found.

Authors to Follow