A Novel Active Learning Approach to Label One Million Unknown Malware Variants

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of low-cost, high-efficiency labeling for large-scale (millions-of-samples) unknown modern malware families, this paper proposes ViT-BNN—a novel fusion model integrating Vision Transformer (ViT) with Bayesian Neural Networks (BNNs). By leveraging Bayesian inference, ViT-BNN quantifies parameter uncertainty through principled prior and posterior distributions, thereby enhancing sample selection reliability and generalization in active learning. Compared to the conventional Inception-V4+PCA+SVM ensemble, ViT-BNN demonstrates superior robustness and stability in cross-family and cross-variant scenarios. Experiments show that ViT-BNN improves active learning efficiency by 32.7%, boosts final classification accuracy by 5.8%, and significantly reduces reliance on expert-labeled data. This work establishes a scalable, interpretable, uncertainty-driven paradigm for large-scale unknown malware family identification.

Technology Category

Application Category

📝 Abstract
Active learning for classification seeks to reduce the cost of labeling samples by finding unlabeled examples about which the current model is least certain and sending them to an annotator/expert to label. Bayesian theory can provide a probabilistic view of deep neural network models by asserting a prior distribution over model parameters and estimating the uncertainties by posterior distribution over these parameters. This paper proposes two novel active learning approaches to label one million malware examples belonging to different unknown modern malware families. The first model is Inception-V4+PCA combined with several support vector machine (SVM) algorithms (UTSVM, PSVM, SVM-GSU, TBSVM). The second model is Vision Transformer based Bayesian Neural Networks ViT-BNN. Our proposed ViT-BNN is a state-of-the-art active learning approach that differs from current methods and can apply to any particular task. The experiments demonstrate that the ViT-BNN is more stable and robust in handling uncertainty.
Problem

Research questions and friction points this paper is trying to address.

Develop active learning to label unknown malware variants efficiently
Combine Bayesian theory with deep learning for uncertainty estimation
Propose ViT-BNN as robust model for malware classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inception-V4+PCA with SVM algorithms for malware labeling
Vision Transformer based Bayesian Neural Networks (ViT-BNN)
ViT-BNN handles uncertainty robustly in active learning
🔎 Similar Papers
No similar papers found.
A
Ahmed Bensaoud
Department of Computer Science, University of Colorado Colorado Springs, USA
Jugal Kalita
Jugal Kalita
University of Colorado, Colorado Springs
Natural Language ProcessingComputational LinguisticsAnomaly DetectionCybersecurity