🤖 AI Summary
To address the high computational overhead and low efficiency of traditional traffic classification methods amid surging encrypted traffic, this paper proposes RFC-driven lightweight traffic representation—NetMatrix. Instead of redundant image-text conversion, NetMatrix directly extracts structured protocol fields from official RFC specifications to construct compact, tabular features. Integrated with XGBoost, it enables highly efficient classification while maintaining competitive accuracy. This work pioneers the deep integration of protocol semantic modeling with minimalistic representation, empirically validating the “less-is-more” paradigm in encrypted traffic classification. Experimental results show that NetMatrix matches state-of-the-art models—including ET-BERT and YaTC—in classification accuracy, achieves over 10× faster inference speed, and reduces memory footprint by two to three orders of magnitude.
📝 Abstract
The rapid growth of encryption has significantly enhanced privacy and security while posing challenges for network traffic classification. Recent approaches address these challenges by transforming network traffic into text or image formats to leverage deep-learning models originally designed for natural language processing, and computer vision. However, these transformations often contradict network protocol specifications, introduce noisy features, and result in resource-intensive processes. To overcome these limitations, we propose NetMatrix, a minimalistic tabular representation of network traffic that eliminates noisy attributes and focuses on meaningful features leveraging RFCs (Request for Comments) definitions. By combining NetMatrix with a vanilla XGBoost classifier, we implement a lightweight approach, LiM ("Less is More") that achieves classification performance on par with state-of-the-art methods such as ET-BERT and YaTC. Compared to selected baselines, experimental evaluations demonstrate that LiM improves resource consumption by orders of magnitude. Overall, this study underscores the effectiveness of simplicity in traffic representation and machine learning model selection, paving the way towards resource-efficient network traffic classification.