🤖 AI Summary
This study addresses the challenges of network traffic classification under scarce labeled data and stringent privacy constraints, where conventional generative approaches struggle to balance temporal modeling fidelity with computational efficiency. For the first time, it systematically evaluates lightweight generative AI models—including Transformers, state space models, and diffusion models—for synthetic traffic generation, proposing an efficient and privacy-preserving synthesis framework. The generated traffic accurately preserves both static and dynamic temporal characteristics. Notably, classifiers trained exclusively on synthetic data achieve an F1-score of 87% on real-world traffic. Furthermore, in low-data regimes, the proposed data augmentation strategy improves classification performance by up to 40%, substantially narrowing the gap with models trained on full datasets.
📝 Abstract
Accurate Network Traffic Classification (NTC) is increasingly constrained by limited labeled data and strict privacy requirements. While Network Traffic Generation (NTG) provides an effective means to mitigate data scarcity, conventional generative methods struggle to model the complex temporal dynamics of modern traffic or/and often incur significant computational cost. In this article, we address the NTG task using lightweight Generative Artificial Intelligence (GenAI) architectures, including transformer-based, state-space, and diffusion models designed for practical deployment. We conduct a systematic evaluation along four axes: (i) (synthetic) traffic fidelity, (ii) synthetic-only training, (iii) data augmentation under low-data regimes, and (iv) computational efficiency. Experiments on two heterogeneous datasets show that lightweight GenAI models preserve both static and temporal traffic characteristics, with transformer and state-space models closely matching real distributions across a complete set of fidelity metrics. Classifiers trained solely on synthetic traffic achieve up to 87% F1-score on real data. In low-data settings, GenAI-driven augmentation improves NTC performance by up to +40%, substantially reducing the gap with full-data training. Overall, transformer-based models provide the best trade-off between fidelity and efficiency, enabling high-quality, privacy-aware traffic synthesis with modest computational overhead.