🤖 AI Summary
Existing network traffic generation methods struggle to accurately model multi-flow interactions and TCP state machines because they directly decode raw packet fields, conflating behavioral semantics with protocol constraints and relying on heuristic post-hoc repairs. This work proposes TraceCodec, the first framework to integrate a neural codec with a deterministic protocol compiler in a collaborative architecture. By shifting the generation space from raw packet headers to a structured latent space of packet actions—each comprising a timestamp, an explicit flow slot, and transmission cues—and modeling sequences of continuous latent variables, TraceCodec decouples generative logic from protocol implementation. This enables synthesis of high-fidelity PCAP traces without requiring post-generation correction. Evaluated on the CICIDS2017 Monday dataset, TraceCodec achieves packet count, protocol composition, and flow size errors below 0.03%, significantly outperforming baselines in flow count accuracy, TCP state fidelity, and preservation of multi-flow interleaving structures.
📝 Abstract
Critical networking workflows require high-fidelity packet captures (PCAPs) for testing, security analysis, and protocol validation, not just statistical flow-level summaries. Recent packet generators have demonstrated protocol-constrained PCAP synthesis, but they universally decode directly to raw packet fields. That interface entangles learned behavioral choices with deterministic protocol consequences, which forces packet realization to depend on post-hoc heuristic repair. We identify this decode interface as the fundamental bottleneck and present TraceCodec, a state-aware neural codec for stateful multi-flow traces. TraceCodec lifts each packet into a timed packet action with explicit flow slots and transport cues, then learns a continuous per-packet latent. A deterministic compiler lowers decoded actions back to PCAPs, owning endpoint assignment, TCP state, legality constraints, and packet rendering. The latent layer exposes a generator-facing sequence space, so downstream traffic models can operate on packet-action latents rather than raw header fields. On CICIDS2017 Monday, TraceCodec matches packet count, protocol composition, and flow population to within 0.03%. Raw-field baselines under the same non-repair policy distort flow counts and TCP state by orders of magnitude. Structural diagnostics show that TraceCodec preserves TCP state transitions and multi-flow interleaving that raw-field decoders fragment. This work establishes a new foundation for high-fidelity packet-trace generation.