🤖 AI Summary
To address the challenge of malicious traffic detection in Internet-of-Things (IoT) networks, this paper proposes a multimodal deep learning framework integrating graph representation learning and sequential modeling. Methodologically, it innovatively couples GraphSAGE (to model device-level topological relationships), BERT (to capture long-range semantic dependencies in packet sequences), and TCN augmented with multi-head attention (to characterize temporal dynamics), while employing BI-LSTM/LSTM as comparative baselines. Experiments on mainstream IoT datasets demonstrate that the BERT variant achieves state-of-the-art performance—99.94% accuracy, 99.99% F1-score, and highest AUC-ROC; GraphSAGE attains the fastest training convergence; and multi-head attention significantly improves model interpretability. This work constitutes the first systematic validation of pretrained language models’ superiority in IoT traffic modeling, establishing a novel paradigm for lightweight, high-accuracy, and interpretable edge-based intrusion detection.
📝 Abstract
This paper intends to detect IoT malicious attacks through deep learning models and demonstrates a comprehensive evaluation of the deep learning and graph-based models regarding malicious network traffic detection. The models particularly are based on GraphSAGE, Bidirectional encoder representations from transformers (BERT), Temporal Convolutional Network (TCN) as well as Multi-Head Attention, together with Bidirectional Long Short-Term Memory (BI-LSTM) Multi-Head Attention and BI-LSTM and LSTM models. The chosen models demonstrated great performance to model temporal patterns and detect feature significance. The observed performance are mainly due to the fact that IoT system traffic patterns are both sequential and diverse, leaving a rich set of temporal patterns for the models to learn. Experimental results showed that BERT maintained the best performance. It achieved 99.94% accuracy rate alongside high precision and recall, F1-score and AUC-ROC score of 99.99% which demonstrates its capabilities through temporal dependency capture. The Multi-Head Attention offered promising results by providing good detection capabilities with interpretable results. On the other side, the Multi-Head Attention model required significant processing time like BI-LSTM variants. The GraphSAGE model achieved good accuracy while requiring the shortest training time but yielded the lowest accuracy, precision, and F1 score compared to the other models