Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper investigates the synergistic mechanism of signature-based rules in the training phase of AI-based Windows malware detection systems. To address the vulnerability of conventional end-to-end learning to adversarial examples and temporal drift, we propose a “rule-preceding filtering” paradigm: only samples undetected by signature rules are used to train the machine learning model, thereby explicitly steering the model to learn from rule-blind regions. This constitutes the first systematic empirical analysis of rule–AI coupling during model training. Extensive experiments—including adversarial robustness evaluation, cross-temporal performance assessment, and ablation studies—demonstrate that the approach significantly improves generalization against novel variants and temporal distribution shifts, while enhancing adversarial resilience. However, performance is fundamentally constrained by signature rule quality, imposing an irreducible false-positive floor.

Technology Category

Application Category

📝 Abstract

Malware detection increasingly relies on AI systems that integrate signature-based detection with machine learning. However, these components are typically developed and combined in isolation, missing opportunities to reduce data complexity and strengthen defenses against adversarial EXEmples, carefully crafted programs designed to evade detection. Hence, in this work we investigate the influence that signature-based detection exerts on model training, when they are included inside the training pipeline. Specifically, we compare models trained on a comprehensive dataset with an AI system whose machine learning component is trained solely on samples not already flagged by signatures. Our results demonstrate improved robustness to both adversarial EXEmples and temporal data drift, although this comes at the cost of a fixed lower bound on false positives, driven by suboptimal rule selection. We conclude by discussing these limitations and outlining how future research could extend AI-based malware detection to include dynamic analysis, thereby further enhancing system resilience.

Problem

Research questions and friction points this paper is trying to address.

Investigates signature-based detection impact on AI malware model training

Compares models trained with and without signature-flagged samples

Explores trade-offs in robustness, false positives, and rule selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates signature-based detection with machine learning

Trains AI on samples not flagged by signatures

Improves robustness to adversarial examples and data drift

🔎 Similar Papers

Explainable Artificial Intelligence (XAI) for Malware Analysis: A Survey of Techniques, Applications, and Open Challenges