๐ค AI Summary
Behavioral malware detectors are vulnerable to adversarial attacks in both feature and problem spaces and struggle with malwareโs non-deterministic execution. To address this, we propose the first end-to-end adversarial framework explicitly accounting for execution stochasticity. Our method introduces: (1) PS-FGSM, a novel gradient-based perturbation algorithm tailored for sequential behavioral features; (2) a sandbox-guided, code-level problem-space modification strategy ensuring consistent evasion across multiple executions; and (3) a unified white-box/black-box attack architecture leveraging RNN-based dynamic behavioral modeling. Evaluated on two state-of-the-art RNN-based detectors, our approach achieves a 99% attack success rate while significantly reducing required code modifications. It outperforms existing methods substantially in both white-box and black-box settings, demonstrating superior robustness, efficiency, and cross-execution consistency.
๐ Abstract
Machine learning algorithms can effectively classify malware through dynamic behavior but are susceptible to adversarial attacks. Existing attacks, however, often fail to find an effective solution in both the feature and problem spaces. This issue arises from not addressing the intrinsic nondeterministic nature of malware, namely executing the same sample multiple times may yield significantly different behaviors. Hence, the perturbations computed for a specific behavior may be ineffective for others observed in subsequent executions. In this paper, we show how an attacker can augment their chance of success by leveraging a new and more efficient feature space algorithm for sequential data, which we have named PS-FGSM, and by adopting two problem space strategies specially tailored to address nondeterminism in the problem space. We implement our novel algorithm and attack strategies in Tarallo, an end-to-end adversarial framework that significantly outperforms previous works in both white and black-box scenarios. Our preliminary analysis in a sandboxed environment and against two RNN-based malware detectors, shows that Tarallo achieves a success rate up to 99% on both feature and problem space attacks while significantly minimizing the number of modifications required for misclassification.