🤖 AI Summary
Existing pre-trained models struggle to reconcile the mismatch between the multi-granular, heterogeneous structures inherent in network traffic—such as bytes, protocols, and packets—and the flat input representations typically assumed by standard architectures. To address this limitation, this work proposes Nethira, the first hierarchical self-supervised model that explicitly incorporates multi-level traffic structure into pre-training. Nethira learns structure-aware representations through hierarchical reconstruction and augmentation mechanisms, and further reduces reliance on labeled data during fine-tuning by introducing consistency regularization. Experimental results demonstrate that Nethira achieves an average F1-score improvement of 9.11% across four public datasets, and remarkably matches the performance of fully supervised methods using only 1% of labeled data in highly heterogeneous tasks.
📝 Abstract
Network traffic classification is vital for network security and management. The pre-training technology has shown promise by learning general traffic representations from raw byte sequences, thereby reducing reliance on labeled data. However, existing pre-trained models struggle with the gap between traffic heterogeneity (i.e., hierarchical traffic structures) and input homogeneity (i.e., flattened byte sequences). To address this gap, we propose Nethira, a heterogeneity-aware pre-trained model based on hierarchical reconstruction and augmentation. In pre-training, Nethira introduces hierarchical reconstruction at multiple levels-byte, protocol, and packet-capturing comprehensive traffic structural information. During fine-tuning, Nethira proposes a consistency-regularized strategy with hierarchical traffic augmentation to reduce label dependence. Experiments on four public datasets demonstrate that Nethira outperforms seven existing pre-trained models, achieving an average F1-score improvement of 9.11%, and reaching comparable performance with only 1% labeled data on high-heterogeneity network tasks.