🤖 AI Summary
This study systematically investigates how Android app obfuscation and other transformations affect online malware detection. Method: We propose a data-driven attribution framework that models antivirus engine behavior, generating traceable samples via six controlled transformations and collecting 971,000 multi-engine detection reports for 179,000 apps from VirusTotal. Cross-validation across static, dynamic, and signature-based detection methods ensures robustness. Contribution/Results: We quantitatively characterize the differential robustness of these three detection paradigms against transformations—the first such empirical analysis—and identify key code- and behavior-level features governing detection decisions. Seven fundamental principles underlying detection failure are uncovered, clarifying root causes of evasion. Our framework significantly enhances the interpretability and attributability of black-box antivirus decisions, providing both empirical foundations and methodological guidance for improving detection models and conducting adversarial evaluations.
📝 Abstract
It is well known that antivirus engines are vulnerable to evasion techniques (e.g., obfuscation) that transform malware into its variants. However, it cannot be necessarily attributed to the effectiveness of these evasions, and the limits of engines may also make this unsatisfactory result. In this study, we propose a data-driven approach to measure the effect of app transformations to malware detection, and further explain why the detection result is produced by these engines. First, we develop an interaction model for antivirus engines, illustrating how they respond with different detection results in terms of varying inputs. Six app transformation techniques are implemented in order to generate a large number of Android apps with traceable changes. Then we undertake a one-month tracking of app detection results from multiple antivirus engines, through which we obtain over 971K detection reports from VirusTotal for 179K apps in total. Last, we conduct a comprehensive analysis of antivirus engines based on these reports from the perspectives of signature-based, static analysis-based, and dynamic analysis-based detection techniques. The results, together with 7 highlighted findings, identify a number of sealed working mechanisms occurring inside antivirus engines and what are the indicators of compromise in apps during malware detection.