🤖 AI Summary
This study addresses the challenge of attributing Android residential proxy malware to specific families, which is hindered by extensive code reuse, embedded SDKs, and obfuscation techniques. To overcome this, the authors propose a novel approach that integrates static analysis–derived control flow graphs, function call graphs, and behavioral capability vectors. For the first time in this domain, the Weisfeiler–Lehman graph kernel is applied to capture structural similarities, complemented by interpretable Yara rules that expose persistent commercial ties between developers and proxy networks. Evaluated on a dataset of 3,365 samples, the model achieves a macro F1-score of 0.985; after filtering with Yara rules, family-level attribution accuracy reaches 88.45%. Notably, 51.4% of the samples still incorporate proxy SDKs, linked to 23 active distributors engaged in ongoing campaigns.
📝 Abstract
Android residential proxy applications represent a growing class of potentially-unwanted programs (PUPs) that covertly route third-party traffic through end-user devices, enabling ad fraud, credential abuse, and evasion of geolocation controls by sophisticated threat actors. Attributing an unknown APK to a specific proxy network remains challenging due to code reuse, SDK embedding, and obfuscation across proxy families.
We present a static-analysis pipeline for automated proxyware family attribution, extracting graph-structured representations (control-flow and function-call graphs) and behavioral signatures from a labeled corpus of 3,365 Android proxy apps spanning four commercial proxy networks. We evaluate Weisfeiler-Lehman graph kernel features alone and fused with binary capability vectors across multiple classifiers. Using 5-fold DEX-grouped cross-validation to prevent data leakage, SGD achieves a macro F1 of 0.985 on the expanded dataset. To support explainability, we map classifier decisions to automatically generated Yara rules, achieving per-family accuracies up to 88.45\% after filtering non-discriminative signatures.
Finally, we discuss these results in the context of the broader ecosystem. We find that from the expanded dataset, the majority of applications (51.4\%) still available through APKPure still contain embedded proxy SDK code. Further analysis of developer accounts reveals that 23 developers are responsible for other applications also containing such functionality, suggesting continuous and ongoing commercial relationships between proxy providers and developers.