FlowXpert: Context-Aware Flow Embedding for Enhanced Traffic Detection in IoT Network

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the sparsity of traditional features and insufficient semantic modeling caused by dynamic IoT traffic, this paper proposes a context-aware flow embedding method. It discards highly sparse temporal and length-based features and instead incorporates source-host contextual semantic features. A lightweight embedding framework is designed, integrating DBSCAN-based unsupervised clustering with contrastive learning to achieve fine-grained and robust semantic representation of network traffic. Evaluated on the real-world MAWI dataset, the method significantly outperforms multiple state-of-the-art models in detection accuracy, generalization capability, and robustness against noise and adversarial perturbations. Moreover, its low computational overhead enables real-time deployment on resource-constrained IoT devices. This work establishes a novel paradigm for efficient anomaly detection in IoT environments, balancing expressiveness, scalability, and practical deployability.

Technology Category

Application Category

📝 Abstract
In the Internet of Things (IoT) environment, continuous interaction among a large number of devices generates complex and dynamic network traffic, which poses significant challenges to rule-based detection approaches. Machine learning (ML)-based traffic detection technology, capable of identifying anomalous patterns and potential threats within this traffic, serves as a critical component in ensuring network security. This study first identifies a significant issue with widely adopted feature extraction tools (e.g., CICMeterFlow): the extensive use of time- and length-related features leads to high sparsity, which adversely affects model convergence. Furthermore, existing traffic detection methods generally lack an embedding mechanism capable of efficiently and comprehensively capturing the semantic characteristics of network traffic. To address these challenges, we propose a novel feature extraction tool that eliminates traditional time and length features in favor of context-aware semantic features related to the source host, thus improving the generalizability of the model. In addition, we design an embedding training framework that integrates the unsupervised DBSCAN clustering algorithm with a contrastive learning strategy to effectively capture fine-grained semantic representations of traffic. Extensive empirical evaluations are conducted on the real-world Mawi data set to validate the proposed method in terms of detection accuracy, robustness, and generalization. Comparative experiments against several state-of-the-art (SOTA) models demonstrate the superior performance of our approach. Furthermore, we confirm its applicability and deployability in real-time scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addresses high sparsity in IoT traffic features from time/length attributes
Lacks embedding mechanism for semantic network traffic characteristics
Improves detection accuracy and generalization in dynamic IoT environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware semantic features replacing time/length attributes
DBSCAN clustering integrated with contrastive learning strategy
Unsupervised embedding framework capturing fine-grained traffic semantics
🔎 Similar Papers
No similar papers found.
C
Chao Zha
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Research Center for High Efficiency Computing Infrastructure, Zhejiang Lab, Hangzhou 311100, Zhejiang, China; University of the Chinese Academy of Sciences, Beijing 100049, China
Haolin Pan
Haolin Pan
Institute of Software Chinese Academy of Sciences
AI for CompilerSIMD OptimizationCompiler Technology
Bing Bai
Bing Bai
Assistant Professor of Radiology, University of Southern California
Image reconstructionImage processingRadiologyPETCT
J
Jiangxing Wu
National Digital Switching System Engineering and Technological R&D Center, Zhengzhou 450003, China
R
Ruyun Zhang
Research Center for High Efficiency Computing Infrastructure, Zhejiang Lab, Hangzhou 311100, Zhejiang, China