Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the challenges of network traffic classification under scarce labeled data and stringent privacy constraints, where conventional generative approaches struggle to balance temporal modeling fidelity with computational efficiency. For the first time, it systematically evaluates lightweight generative AI models—including Transformers, state space models, and diffusion models—for synthetic traffic generation, proposing an efficient and privacy-preserving synthesis framework. The generated traffic accurately preserves both static and dynamic temporal characteristics. Notably, classifiers trained exclusively on synthetic data achieve an F1-score of 87% on real-world traffic. Furthermore, in low-data regimes, the proposed data augmentation strategy improves classification performance by up to 40%, substantially narrowing the gap with models trained on full datasets.

Technology Category

Application Category

📝 Abstract

Accurate Network Traffic Classification (NTC) is increasingly constrained by limited labeled data and strict privacy requirements. While Network Traffic Generation (NTG) provides an effective means to mitigate data scarcity, conventional generative methods struggle to model the complex temporal dynamics of modern traffic or/and often incur significant computational cost. In this article, we address the NTG task using lightweight Generative Artificial Intelligence (GenAI) architectures, including transformer-based, state-space, and diffusion models designed for practical deployment. We conduct a systematic evaluation along four axes: (i) (synthetic) traffic fidelity, (ii) synthetic-only training, (iii) data augmentation under low-data regimes, and (iv) computational efficiency. Experiments on two heterogeneous datasets show that lightweight GenAI models preserve both static and temporal traffic characteristics, with transformer and state-space models closely matching real distributions across a complete set of fidelity metrics. Classifiers trained solely on synthetic traffic achieve up to 87% F1-score on real data. In low-data settings, GenAI-driven augmentation improves NTC performance by up to +40%, substantially reducing the gap with full-data training. Overall, transformer-based models provide the best trade-off between fidelity and efficiency, enabling high-quality, privacy-aware traffic synthesis with modest computational overhead.

Problem

Research questions and friction points this paper is trying to address.

Network Traffic Classification

Network Traffic Generation

Generative Artificial Intelligence

Data Scarcity

Privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight GenAI

Network Traffic Synthesis

Data Augmentation