🤖 AI Summary
To address poor generalizability and weak robustness in intrusion detection caused by network traffic heterogeneity and diverse attack patterns, this paper proposes BERTector—a novel framework grounded in pre-trained language models. First, it introduces the NSS-Tokenizer, a semantic-aware tokenization module tailored for network traffic. Second, it establishes a joint multi-source dataset training paradigm that supports both supervised fine-tuning and collaborative optimization across heterogeneous data. Third, it pioneers the integration of Low-Rank Adaptation (LoRA) into intrusion detection, markedly improving training efficiency and parameter update stability. Evaluated on multiple benchmark datasets, BERTector achieves state-of-the-art detection accuracy while demonstrating superior cross-domain generalization and robustness against adversarial perturbations. This work establishes a scalable, pre-trained language model–based paradigm for network security analysis.
📝 Abstract
Intrusion detection systems (IDS) are facing challenges in generalization and robustness due to the heterogeneity of network traffic and the diversity of attack patterns. To address this issue, we propose a new joint-dataset training paradigm for IDS and propose a scalable BERTector framework based on BERT. BERTector integrates three key components: NSS-Tokenizer for traffic-aware semantic tokenization, supervised fine-tuning with a hybrid dataset, and low-rank adaptation (LoRA) for efficient training. Extensive experiments show that BERTector achieves state-of-the-art detection accuracy, strong cross-dataset generalization capabilities, and excellent robustness to adversarial perturbations. This work establishes a unified and efficient solution for modern IDS in complex and dynamic network environments.