Network Traffic Classification Using Self-Supervised Learning and Confident Learning

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

The rapid proliferation of encrypted traffic and dynamic port usage has rendered conventional network traffic classification (NTC) methods ineffective, while supervised learning approaches suffer from heavy reliance on large-scale labeled data and unsupervised methods yield insufficient accuracy. To address these challenges, this paper proposes a novel framework integrating self-supervised learning with confident learning. Specifically, it employs an autoencoder to extract traffic representations and introduces Tabular Contrastive Learning (TabCL)—a network-traffic-optimized contrastive learning scheme—to generate high-quality pseudo-labels. Confident learning (CL) is further incorporated to explicitly model and rectify label noise in the pseudo-labels. Extensive experiments on three real-world datasets demonstrate that the proposed method achieves substantial improvements in classification accuracy under extremely low labeling costs, outperforming existing state-of-the-art approaches. The framework exhibits high accuracy, strong generalization across diverse traffic scenarios, and practical deployability in real-world network environments.

Technology Category

Application Category

📝 Abstract

Network traffic classification (NTC) is vital for efficient network management, security, and performance optimization, particularly with 5G/6G technologies. Traditional methods, such as deep packet inspection (DPI) and port-based identification, struggle with the rise of encrypted traffic and dynamic port allocations. Supervised learning methods provide viable alternatives but rely on large labeled datasets, which are difficult to acquire given the diversity and volume of network traffic. Meanwhile, unsupervised learning methods, while less reliant on labeled data, often exhibit lower accuracy. To address these limitations, we propose a novel framework that first leverages Self-Supervised Learning (SSL) with techniques such as autoencoders or Tabular Contrastive Learning (TabCL) to generate pseudo-labels from extensive unlabeled datasets, addressing the challenge of limited labeled data. We then apply traffic-adopted Confident Learning (CL) to refine these pseudo-labels, enhancing classification precision by mitigating the impact of noise. Our proposed framework offers a generalizable solution that minimizes the need for extensive labeled data while delivering high accuracy. Extensive simulations and evaluations, conducted using three datasets (ISCX VPN-nonVPN, self-generated dataset, and UCDavis--QUIC), and demonstrate that our method achieves superior accuracy compared to state-of-the-art techniques in classifying network traffic.

Problem

Research questions and friction points this paper is trying to address.

Addresses encrypted traffic classification challenges with limited labeled data

Leverages self-supervised learning to generate pseudo-labels from unlabeled datasets

Refines pseudo-labels using confident learning to enhance classification accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses self-supervised learning for pseudo-label generation

Applies confident learning to refine noisy labels

Minimizes labeled data need while ensuring high accuracy

🔎 Similar Papers

No similar papers found.

TikTok

Seattle, Washington

Machine Learning Engineer