MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

To address catastrophic forgetting in continual learning for malware detection, this paper proposes a generative replay method based on Generative Adversarial Networks (GANs). The method tackles the challenge of preserving historical knowledge while learning new malware classes incrementally. Its key contributions are: (1) incorporating a feature-matching loss to enhance the discriminability of generated samples, and (2) designing a latent-representation-driven intelligent sample selection strategy to improve replay efficiency and representativeness. Evaluated under a class-incremental continual learning setting, the approach significantly mitigates performance degradation on previously learned malware categories. Experiments on Windows and Android malware datasets yield an average accuracy of 55%, outperforming existing generative replay methods by 28 percentage points. The results demonstrate substantial forgetting suppression and establish a scalable, memory-augmented paradigm for detecting evolving malware.

Technology Category

Application Category

📝 Abstract

Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time. In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model's hidden representations. Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario -- where new classes are introduced continuously over multiple tasks -- demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55% on Windows malware samples, significantly outperforming other GR-based models by 28%. This study provides practical insights for advancing GR-based malware classification systems. The implementation is available at url {https://github.com/MalwareReplayGAN/MalCL}footnote{The code will be made public upon the presentation of the paper}.

Problem

Research questions and friction points this paper is trying to address.

Continual Learning

Malware Detection

Memory Retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

MalCL

Generative Adversarial Network (GAN)

Continuous Learning

🔎 Similar Papers

MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning

2024-09-20arXiv.orgCitations: 0

💼 Related Jobs

Machine Learning Engineer - Health AIML

Apple

Cupertino, United States of America

Authors to Follow