Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing pre-trained models struggle to reconcile the mismatch between the multi-granular, heterogeneous structures inherent in network traffic—such as bytes, protocols, and packets—and the flat input representations typically assumed by standard architectures. To address this limitation, this work proposes Nethira, the first hierarchical self-supervised model that explicitly incorporates multi-level traffic structure into pre-training. Nethira learns structure-aware representations through hierarchical reconstruction and augmentation mechanisms, and further reduces reliance on labeled data during fine-tuning by introducing consistency regularization. Experimental results demonstrate that Nethira achieves an average F1-score improvement of 9.11% across four public datasets, and remarkably matches the performance of fully supervised methods using only 1% of labeled data in highly heterogeneous tasks.

Technology Category

Application Category

📝 Abstract

Network traffic classification is vital for network security and management. The pre-training technology has shown promise by learning general traffic representations from raw byte sequences, thereby reducing reliance on labeled data. However, existing pre-trained models struggle with the gap between traffic heterogeneity (i.e., hierarchical traffic structures) and input homogeneity (i.e., flattened byte sequences). To address this gap, we propose Nethira, a heterogeneity-aware pre-trained model based on hierarchical reconstruction and augmentation. In pre-training, Nethira introduces hierarchical reconstruction at multiple levels-byte, protocol, and packet-capturing comprehensive traffic structural information. During fine-tuning, Nethira proposes a consistency-regularized strategy with hierarchical traffic augmentation to reduce label dependence. Experiments on four public datasets demonstrate that Nethira outperforms seven existing pre-trained models, achieving an average F1-score improvement of 9.11%, and reaching comparable performance with only 1% labeled data on high-heterogeneity network tasks.

Problem

Research questions and friction points this paper is trying to address.

network traffic classification

traffic heterogeneity

pre-trained model

hierarchical structure

input homogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneity-aware

hierarchical pre-training

traffic classification