Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Static malware classifiers often exhibit bias and poor interpretability due to their reliance on non-semantic artifacts such as packing. This work proposes a reproducible diagnostic framework that integrates the TRUSTEE post-hoc explainability method, feature importance ranking, and controlled dataset experiments to systematically assess model sensitivity to non-semantic features. Through fine-grained analysis of PE metadata and string n-grams, the study reveals that top discriminative features predominantly originate from packing artifacts, demonstrating that classifiers frequently conflate packing behavior with malicious intent. Furthermore, the research highlights the substantial influence of dataset composition on model decisions. These findings provide empirical evidence and actionable guidance for developing semantically robust static malware detection models.

📝 Abstract

Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address this issue, we proposed a two-part framework using the post-hoc interpretability XAI tool TRUSTEE, followed by a manual analysis of the top features. We conducted several controlled experiments by varying the dataset composition ratios to understand their impact on the results. The top-ranked features across all experiments, identified by TRUSTEE, were predominantly packing artifacts, portable executable(PE) metadata, and n-grams at the string level, rather than malicious semantics. These results suggest that these malware classifiers are highly sensitive to dataset composition and can misinterpret packing as malicious behavior. Our proposed framework allows for the reproducible diagnosis of such biases and forms a guideline for building more robust and semantically meaningful malware detection models

Problem

Research questions and friction points this paper is trying to address.

static malware classification

artifact reliance

packing

interpretability

dataset bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

artifact reliance

static malware classification

TRUSTEE