FlowletFormer: Network Behavioral Semantic Aware Pre-training Model for Traffic Classification

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing network traffic pretraining models struggle to jointly model packet-level structure, flow-level behavioral patterns, hierarchical protocol semantics, and cross-packet/cross-flow contextual relationships. To address this, we propose FlowletFormer—the first BERT-style pretraining model specifically designed for network traffic. It introduces three key innovations: (1) behavior-semantic-aware flow segmentation, (2) protocol-stack-aligned embedding layers, and (3) field-specific and context-aware joint pretraining objectives. These components systematically integrate domain knowledge of networking protocols, significantly enhancing semantic understanding—particularly for stateful protocols such as TCP. Extensive experiments demonstrate that FlowletFormer consistently outperforms prior methods across representation quality, classification accuracy, and few-shot generalization capability. Moreover, it exhibits superior robustness and practicality in complex, real-world network scenarios.

Technology Category

Application Category

📝 Abstract
Network traffic classification using pre-training models has shown promising results, but existing methods struggle to capture packet structural characteristics, flow-level behaviors, hierarchical protocol semantics, and inter-packet contextual relationships. To address these challenges, we propose FlowletFormer, a BERT-based pre-training model specifically designed for network traffic analysis. FlowletFormer introduces a Coherent Behavior-Aware Traffic Representation Model for segmenting traffic into semantically meaningful units, a Protocol Stack Alignment-Based Embedding Layer to capture multilayer protocol semantics, and Field-Specific and Context-Aware Pretraining Tasks to enhance both inter-packet and inter-flow learning. Experimental results demonstrate that FlowletFormer significantly outperforms existing methods in the effectiveness of traffic representation, classification accuracy, and few-shot learning capability. Moreover, by effectively integrating domain-specific network knowledge, FlowletFormer shows better comprehension of the principles of network transmission (e.g., stateful connections of TCP), providing a more robust and trustworthy framework for traffic analysis.
Problem

Research questions and friction points this paper is trying to address.

Captures packet structural characteristics and flow-level behaviors
Integrates hierarchical protocol semantics and inter-packet relationships
Enhances traffic representation accuracy and few-shot learning capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coherent Behavior-Aware Traffic Representation Model
Protocol Stack Alignment-Based Embedding Layer
Field-Specific and Context-Aware Pretraining Tasks
🔎 Similar Papers
No similar papers found.
L
Liming Liu
Tsinghua Shenzhen International Graduate School
R
Ruoyu Li
Shenzhen University
Q
Qing Li
Peng Cheng Laboratory
M
Meijia Hou
Zhongguancun Laboratory
Y
Yong Jiang
Tsinghua University
Mingwei Xu
Mingwei Xu
Computer Science, Tsinghua University
Internet architecture