🤖 AI Summary
This work proposes a compact, end-to-end byte-level network traffic classifier based on the Mamba-2 architecture that operates directly on raw packet byte sequences without tokenization or self-supervised pretraining. By leveraging residual pre-normalized Mamba-2 blocks, byte embeddings, and a learnable CLS token, the model is trained in a supervised manner on fixed-length burst sequences, fully preserving temporal information. Evaluated on six public benchmarks, the method matches or exceeds the performance of more complex pretrained baselines while offering faster training and strong generalization. To the best of our knowledge, this is the first approach to achieve efficient, purely byte-level traffic classification without any reliance on pretraining.
📝 Abstract
We present MambaNetBurst, a compact tokenizer-free byte-level sequence classifier for network burst classification based on a Mamba-2 backbone. In contrast to most recent strong traffic-classification and intrusion-detection approaches, our method operates directly on raw packet bytes, avoids tokenization, patching, and heavy engineered multimodal representations, and does not require any self-supervised pre-training stage. Given a packet flow, we form a fixed-length burst from the first few packets, embed the resulting byte sequence appending a learnable CLS token, and process it with a stack of residual pre-normalized Mamba-2 blocks for end-to-end supervised classification. Across six public benchmarks spanning encrypted mobile app identification, VPN/Tor traffic classification, malware traffic classification, and IoT attack traffic, MambaNetBurst achieves consistently strong results and is competitive with, or outperforms, substantially heavier and often pre-trained baselines. Our ablation study shows that preserving byte-level temporal resolution is critical, that early downsampling through striding is consistently harmful, and that moderate state sizes are sufficient for robust generalization.
We further show that Mamba-2, despite its more constrained transition structure relative to Mamba-1, remains highly effective for packet-byte modeling while providing clear efficiency advantages, particularly in training speed. Overall, our results demonstrate that direct **undiluted** byte-to-classification learning with compact selective state space models is a practical, effective and novel direction for efficient, deployable traffic analysis that bypasses the complexity of pre-training pipelines even over highly optimized linear attention architectures.