🤖 AI Summary
Network machine learning models often suffer from degraded generalization performance due to distributional shifts (i.e., domain shift) between training and deployment environments. To address this, we propose NetReplica—a novel network emulation system that, for the first time, models networks as a parameterizable chain of bottleneck links. This design preserves TCP’s dynamic behavioral fidelity while enabling fine-grained, decoupled control over critical link properties—including bandwidth, latency, and packet loss. NetReplica synergistically integrates real-world production network traces with controllable synthetic generation to effectively mitigate data scarcity in rare network scenarios. Experiments on the Puffer dataset demonstrate that transmission-time prediction models trained with NetReplica-augmented data achieve a 47% average reduction in prediction error under challenging network conditions. The approach significantly enhances cross-domain robustness and out-of-distribution generalization.
📝 Abstract
Machine learning models in networking suffer from the domain adaptation problem; models trained in one domain often fail when deployed in different production environments. This paper presents the design and implementation of NetReplica, a system that addresses this challenge by generating training datasets with two critical properties: realism in protocol dynamics and controllability of network conditions. NetReplica models networks as collections of bottleneck links with specific attributes, achieves realism by leveraging production network traces, and enables controllability through fine grained control knobs for each link attribute. Our evaluation using Puffer demonstrates that NetReplica not only matches existing data characteristics but generates realistic samples that are underrepresented in or absent from Puffer data. Models trained on NetReplica augmented datasets show substantially improved generalizability, reducing transmission time prediction error by up to 47% for challenging network conditions compared to models trained solely on Puffer data. This work represents a significant step toward solving the domain adaptation problem that has limited the effectiveness of ML based networking systems.