🤖 AI Summary
This work identifies and empirically validates a novel label-poisoning attack: adversaries inject subtle, imperceptible bytecode patterns into benign Android applications to induce misclassification by antivirus (AV) engines on crowdsourced platforms such as VirusTotal, thereby polluting training data and enabling model poisoning. We propose this attack paradigm for the first time and implement it via AndroVenom—a framework for Android bytecode-level pattern injection. By modeling AV engine behavior, our attack achieves targeted misprediction of specific benign samples with only 0.015% of training samples modified, while evading anomaly detection. At a mere 1% poisoning rate, mainstream ML-based detectors—including CNNs, RNNs, and GBDTs—suffer denial-of-service-level degradation. Critically, state-of-the-art feature extractors fail to filter the injected patterns, and multiple advanced models exhibit precise decision manipulation. Our findings expose fundamental vulnerabilities in ML-driven Android malware detection pipelines reliant on crowdsourced labels.
📝 Abstract
Machine learning (ML) malware detectors rely heavily on crowd-sourced AntiVirus (AV) labels, with platforms like VirusTotal serving as a trusted source of malware annotations. But what if attackers could manipulate these labels to classify benign software as malicious? We introduce label spoofing attacks, a new threat that contaminates crowd-sourced datasets by embedding minimal and undetectable malicious patterns into benign samples. These patterns coerce AV engines into misclassifying legitimate files as harmful, enabling poisoning attacks against ML-based malware classifiers trained on those data. We demonstrate this scenario by developing AndroVenom, a methodology for polluting realistic data sources, causing consequent poisoning attacks against ML malware detectors. Experiments show that not only state-of-the-art feature extractors are unable to filter such injection, but also various ML models experience Denial of Service already with 1% poisoned samples. Additionally, attackers can flip decisions of specific unaltered benign samples by modifying only 0.015% of the training data, threatening their reputation and market share and being unable to be stopped by anomaly detectors on training data. We conclude our manuscript by raising the alarm on the trustworthiness of the training process based on AV annotations, requiring further investigation on how to produce proper labels for ML malware detectors.