🤖 AI Summary
To address the reliance on large-scale labeled data and poor generalization in zero-day malware detection, this paper proposes a novel deep packet inspection paradigm integrating self-supervised pretraining with few-shot adaptation. Methodologically, it introduces Transformer-based modeling of raw byte-level network packets for the first time, employing masked language modeling (MLM) for unsupervised representation learning, followed by prototype networks for few-shot threat classification. Key contributions include: (1) end-to-end semantic learning directly from byte sequences; (2) strong generalization to unseen malware families without extensive labeling; and (3) state-of-the-art accuracy—94.76% on UNSW-NB15 and 83.25% on CIC-IoT23—significantly outperforming supervised baselines while maintaining robustness under extreme data scarcity (1–5 samples per class).
📝 Abstract
As networks continue to expand and become more interconnected, the need for novel malware detection methods becomes more pronounced. Traditional security measures are increasingly inadequate against the sophistication of modern cyber attacks. Deep Packet Inspection (DPI) has been pivotal in enhancing network security, offering an in-depth analysis of network traffic that surpasses conventional monitoring techniques. DPI not only examines the metadata of network packets, but also dives into the actual content being carried within the packet payloads, providing a comprehensive view of the data flowing through networks. While the integration of advanced deep learning techniques with DPI has introduced modern methodologies into malware detection and network traffic classification, state-of-the-art supervised learning approaches are limited by their reliance on large amounts of annotated data and their inability to generalize to novel, unseen malware threats. To address these limitations, this paper leverages the recent advancements in self-supervised learning (SSL) and few-shot learning (FSL). Our proposed self-supervised approach trains a transformer via SSL to learn the embedding of packet content, including payload, from vast amounts of unlabeled data by masking portions of packets, leading to a learned representation that generalizes to various downstream tasks. Once the representation is extracted from the packets, they are used to train a malware detection algorithm. The representation obtained from the transformer is then used to adapt the malware detector to novel types of attacks using few-shot learning approaches. Our experimental results demonstrate that our method achieves classification accuracies of up to 94.76% on the UNSW-NB15 dataset and 83.25% on the CIC-IoT23 dataset.