Empirical Quantification of Spurious Correlations in Malware Detection

📅 2025-06-11
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a critical vulnerability in deep learning–based malware detection: models heavily rely on compiler-introduced, semantically vacuous features—particularly null bytes—thereby undermining their capacity to learn genuine code semantics. To address this, we construct a balanced dataset and employ gradient-based attribution, systematic perturbation analysis, and ablation studies to empirically quantify the contribution of spurious correlations to model decisions—the first such characterization in this domain. We further propose a robustness evaluation framework tailored for production deployment. Results demonstrate that state-of-the-art models exhibit significant dependence on null bytes rather than executable logic; one model, after targeted optimization, reduces sensitivity to these spurious features by 42%, achieving improved semantic focus and operational suitability. This work provides both theoretical insights and practical methodologies to enhance the trustworthiness and generalizability of malware detection systems.

Technology Category

Application Category

📝 Abstract
End-to-end deep learning exhibits unmatched performance for detecting malware, but such an achievement is reached by exploiting spurious correlations -- features with high relevance at inference time, but known to be useless through domain knowledge. While previous work highlighted that deep networks mainly focus on metadata, none investigated the phenomenon further, without quantifying their impact on the decision. In this work, we deepen our understanding of how spurious correlation affects deep learning for malware detection by highlighting how much models rely on empty spaces left by the compiler, which diminishes the relevance of the compiled code. Through our seminal analysis on a small-scale balanced dataset, we introduce a ranking of two end-to-end models to better understand which is more suitable to be put in production.
Problem

Research questions and friction points this paper is trying to address.

Quantify spurious correlations in malware detection models
Assess reliance on compiler metadata over compiled code
Rank end-to-end models for production suitability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantify spurious correlations in malware detection
Rank end-to-end models for production suitability
Analyze compiler empty spaces impact on models
🔎 Similar Papers
No similar papers found.