BERTector: Intrusion Detection Based on Joint-Dataset Learning

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor generalizability and weak robustness in intrusion detection caused by network traffic heterogeneity and diverse attack patterns, this paper proposes BERTector—a novel framework grounded in pre-trained language models. First, it introduces the NSS-Tokenizer, a semantic-aware tokenization module tailored for network traffic. Second, it establishes a joint multi-source dataset training paradigm that supports both supervised fine-tuning and collaborative optimization across heterogeneous data. Third, it pioneers the integration of Low-Rank Adaptation (LoRA) into intrusion detection, markedly improving training efficiency and parameter update stability. Evaluated on multiple benchmark datasets, BERTector achieves state-of-the-art detection accuracy while demonstrating superior cross-domain generalization and robustness against adversarial perturbations. This work establishes a scalable, pre-trained language model–based paradigm for network security analysis.

Technology Category

Application Category

📝 Abstract
Intrusion detection systems (IDS) are facing challenges in generalization and robustness due to the heterogeneity of network traffic and the diversity of attack patterns. To address this issue, we propose a new joint-dataset training paradigm for IDS and propose a scalable BERTector framework based on BERT. BERTector integrates three key components: NSS-Tokenizer for traffic-aware semantic tokenization, supervised fine-tuning with a hybrid dataset, and low-rank adaptation (LoRA) for efficient training. Extensive experiments show that BERTector achieves state-of-the-art detection accuracy, strong cross-dataset generalization capabilities, and excellent robustness to adversarial perturbations. This work establishes a unified and efficient solution for modern IDS in complex and dynamic network environments.
Problem

Research questions and friction points this paper is trying to address.

Improving generalization in intrusion detection systems
Enhancing robustness against diverse attack patterns
Addressing network traffic heterogeneity challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint-dataset training paradigm for IDS
BERTector framework with NSS-Tokenizer
LoRA for efficient model training
🔎 Similar Papers
No similar papers found.
Haoyang Hu
Haoyang Hu
Master Student at The University of Hong Kong
Trustworthy LLMAI for Security
Xun Huang
Xun Huang
Unknown affiliation
Generative Models
Chenyu Wu
Chenyu Wu
Tsinghua University
Turbulence modelingmachine learning
S
Shiwen Liu
School of Cyber Science and Engineering, Nanjing University of Science and Technology, China
Z
Zhichao Lian
School of Cyber Science and Engineering, Nanjing University of Science and Technology, China
S
Shuangquan Zhang
School of Cyber Science and Engineering, Nanjing University of Science and Technology, China