On the Security Risks of ML-based Malware Detection Systems: A Survey

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This paper addresses the lack of systematic security analysis for ML-driven malware detection systems in real-world deployment. We propose a comprehensive, CIA-based (Confidentiality, Integrity, Availability) security risk characterization framework and develop a four-stage classification taxonomy covering data collection, feature engineering, model training, and system deployment. For the first time, we conduct empirical analysis from both cross-stage and intra-stage perspectives, systematically surveying prevalent attacks—including data poisoning, adversarial examples, and model extraction—as well as corresponding defenses across all stages. Our analysis reveals critical defense gaps, particularly in feature engineering and deployment. Key contributions are: (1) the first CIA-aligned, end-to-end security risk taxonomy for ML-based malware detection; (2) the first holistic, stage-aware risk classification scheme; and (3) identification of structural defense deficiencies, leading to verifiable, cross-stage collaborative defense strategies.

Technology Category

Application Category

📝 Abstract

Malware presents a persistent threat to user privacy and data integrity. To combat this, machine learning-based (ML-based) malware detection (MD) systems have been developed. However, these systems have increasingly been attacked in recent years, undermining their effectiveness in practice. While the security risks associated with ML-based MD systems have garnered considerable attention, the majority of prior works is limited to adversarial malware examples, lacking a comprehensive analysis of practical security risks. This paper addresses this gap by utilizing the CIA principles to define the scope of security risks. We then deconstruct ML-based MD systems into distinct operational stages, thus developing a stage-based taxonomy. Utilizing this taxonomy, we summarize the technical progress and discuss the gaps in the attack and defense proposals related to the ML-based MD systems within each stage. Subsequently, we conduct two case studies, using both inter-stage and intra-stage analyses according to the stage-based taxonomy to provide new empirical insights. Based on these analyses and insights, we suggest potential future directions from both inter-stage and intra-stage perspectives.

Problem

Research questions and friction points this paper is trying to address.

Analyzing security risks in ML-based malware detection systems

Developing stage-based taxonomy for comprehensive risk assessment

Providing empirical insights via inter-stage and intra-stage case studies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes CIA principles for security risk scope

Deconstructs ML-based MD into operational stages

Conducts inter-stage and intra-stage case studies

🔎 Similar Papers

Explainable Artificial Intelligence (XAI) for Malware Analysis: A Survey of Techniques, Applications, and Open Challenges