🤖 AI Summary
In network security deployment, long-term, full-scale collection and storage of raw PCAP session data are infeasible, particularly for rare attack samples. Method: This paper proposes a domain-knowledge-constrained autoencoder framework that reconstructs fine-grained network sessions—including per-packet structure—from coarse-grained feature vectors. It innovatively embeds formal network protocol syntax and semantics into the loss function as interpretable constraints, jointly optimizing both classification-discriminative feature learning and session-level structural fidelity. The approach integrates deep autoencoding, protocol-aware modeling, and multimodal reconstruction. Results: Evaluated on real-world traffic benchmarks, the method achieves a 23.6% improvement in header-field reconstruction accuracy. It enables high-fidelity adversarial sample regeneration under low storage overhead and supports privacy-preserving model training.
📝 Abstract
The ability to reconstruct fine-grained network session data, including individual packets, from coarse-grained feature vectors is crucial for improving network security models. However, the large-scale collection and storage of raw network traffic pose significant challenges, particularly for capturing rare cyberattack samples. These challenges hinder the ability to retain comprehensive datasets for model training and future threat detection. To address this, we propose a machine learning approach guided by formal methods to encode and reconstruct network data. Our method employs autoencoder models with domain-informed penalties to impute PCAP session headers from structured feature representations. Experimental results demonstrate that incorporating domain knowledge through constraint-based loss terms significantly improves reconstruction accuracy, particularly for categorical features with session-level encodings. By enabling efficient reconstruction of detailed network sessions, our approach facilitates data-efficient model training while preserving privacy and storage efficiency.