🤖 AI Summary
This paper addresses the lack of systematic security analysis for ML-driven malware detection systems in real-world deployment. We propose a comprehensive, CIA-based (Confidentiality, Integrity, Availability) security risk characterization framework and develop a four-stage classification taxonomy covering data collection, feature engineering, model training, and system deployment. For the first time, we conduct empirical analysis from both cross-stage and intra-stage perspectives, systematically surveying prevalent attacks—including data poisoning, adversarial examples, and model extraction—as well as corresponding defenses across all stages. Our analysis reveals critical defense gaps, particularly in feature engineering and deployment. Key contributions are: (1) the first CIA-aligned, end-to-end security risk taxonomy for ML-based malware detection; (2) the first holistic, stage-aware risk classification scheme; and (3) identification of structural defense deficiencies, leading to verifiable, cross-stage collaborative defense strategies.
📝 Abstract
Malware presents a persistent threat to user privacy and data integrity. To combat this, machine learning-based (ML-based) malware detection (MD) systems have been developed. However, these systems have increasingly been attacked in recent years, undermining their effectiveness in practice. While the security risks associated with ML-based MD systems have garnered considerable attention, the majority of prior works is limited to adversarial malware examples, lacking a comprehensive analysis of practical security risks. This paper addresses this gap by utilizing the CIA principles to define the scope of security risks. We then deconstruct ML-based MD systems into distinct operational stages, thus developing a stage-based taxonomy. Utilizing this taxonomy, we summarize the technical progress and discuss the gaps in the attack and defense proposals related to the ML-based MD systems within each stage. Subsequently, we conduct two case studies, using both inter-stage and intra-stage analyses according to the stage-based taxonomy to provide new empirical insights. Based on these analyses and insights, we suggest potential future directions from both inter-stage and intra-stage perspectives.