Malware families discovery via Open-Set Recognition on Android manifest permissions

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the failure of conventional closed-set classification in the face of continuously emerging Android malware families, this paper proposes an open-set recognition framework based on permission features. We are the first to adapt the MaxLogit method—originally developed for computer vision—to Android malware analysis. Our approach integrates high-dimensional sparse modeling of Manifest-permission features with gradient-boosted decision trees (GBDT), enabling joint fine-grained classification of known families and reliable detection of unknown ones. The method incurs low computational overhead and exhibits strong scalability. Extensive evaluation across multiple public and private datasets demonstrates significant improvements in unknown-family detection rates (+12.7% to +28.3%), while maintaining a false positive rate below 1.5%. The framework has been integrated into an enterprise-grade mobile security protection system and deployed in production.

Technology Category

Application Category

📝 Abstract
Malware are malicious programs that are grouped into families based on their penetration technique, source code, and other characteristics. Classifying malware programs into their respective families is essential for building effective defenses against cyber threats. Machine learning models have a huge potential in malware detection on mobile devices, as malware families can be recognized by classifying permission data extracted from Android manifest files. Still, the malware classification task is challenging due to the high-dimensional nature of permission data and the limited availability of training samples. In particular, the steady emergence of new malware families makes it impossible to acquire a comprehensive training set covering all the malware classes. In this work, we present a malware classification system that, on top of classifying known malware, detects new ones. In particular, we combine an open-set recognition technique developed within the computer vision community, namely MaxLogit, with a tree-based Gradient Boosting classifier, which is particularly effective in classifying high-dimensional data. Our solution turns out to be very practical, as it can be seamlessly employed in a standard classification workflow, and efficient, as it adds minimal computational overhead. Experiments on public and proprietary datasets demonstrate the potential of our solution, which has been deployed in a business environment.
Problem

Research questions and friction points this paper is trying to address.

Classifying Android malware into families using manifest permissions
Detecting new malware families with open-set recognition techniques
Handling high-dimensional permission data with limited training samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-set recognition for detecting new malware families
Combines MaxLogit with Gradient Boosting classifier
Handles high-dimensional Android permission data efficiently
🔎 Similar Papers
No similar papers found.