Bias in the Shadows: Explore Shortcuts in Encrypted Network Traffic Classification

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor generalization of encrypted network traffic classification models, which often stems from reliance on spurious correlations—commonly known as shortcut learning. The authors propose BiasSeeker, a model-agnostic, data-driven framework for systematically identifying shortcut features. By conducting statistical correlation analyses on raw byte-level traffic, BiasSeeker constructs a structured taxonomy of shortcut features and introduces a context-aware validation mechanism to verify their impact. Extensive experiments across 19 public datasets and three encrypted traffic classification tasks demonstrate that the framework significantly enhances model generalization in real-world scenarios. The findings underscore the critical role of context-sensitive and intentional feature selection in mitigating shortcut learning and improving robustness.

Technology Category

Application Category

📝 Abstract
Pre-trained models operating directly on raw bytes have achieved promising performance in encrypted network traffic classification (NTC), but often suffer from shortcut learning-relying on spurious correlations that fail to generalize to real-world data. Existing solutions heavily rely on model-specific interpretation techniques, which lack adaptability and generality across different model architectures and deployment scenarios. In this paper, we propose BiasSeeker, the first semi-automated framework that is both model-agnostic and data-driven for detecting dataset-specific shortcut features in encrypted traffic. By performing statistical correlation analysis directly on raw binary traffic, BiasSeeker identifies spurious or environment-entangled features that may compromise generalization, independent of any classifier. To address the diverse nature of shortcut features, we introduce a systematic categorization and apply category-specific validation strategies that reduce bias while preserving meaningful information. We evaluate BiasSeeker on 19 public datasets across three NTC tasks. By emphasizing context-aware feature selection and dataset-specific diagnosis, BiasSeeker offers a novel perspective for understanding and addressing shortcut learning in encrypted network traffic classification, raising awareness that feature selection should be an intentional and scenario-sensitive step prior to model training.
Problem

Research questions and friction points this paper is trying to address.

shortcut learning
encrypted network traffic classification
spurious correlations
dataset bias
model generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

shortcut learning
encrypted traffic classification
model-agnostic
bias detection
feature selection
🔎 Similar Papers
No similar papers found.
C
Chuyi Wang
Department of Computer Science and Technology, Tsinghua University
Xiaohui Xie
Xiaohui Xie
Professor of Computer Science, University of California, Irvine
AIMachine LearningGenomicsNeural Computation
T
Tongze Wang
Department of Computer Science and Technology, Tsinghua University
Yong Cui
Yong Cui
Professor of Computer Science, Tsinghua University
Network ArchitectureMobile Computing