Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating network hardware behavior under large-scale ML training workloads is costly and low-fidelity due to reliance on expensive GPU-based testbeds. Method: This paper introduces Genie, a novel testing framework that generates realistic ML communication traffic solely on CPUs—eliminating the need for GPUs—and drives physical network devices on hardware testbeds; concurrently, it enhances ASTRA-sim to enable co-simulation of network microarchitectures and ML workloads. Contribution/Results: Genie establishes the first “CPU-driven + hardware-measured + simulation-enhanced” paradigm for network–ML co-evaluation, achieving, for the first time without GPUs, high-fidelity coupling between hardware-level network behavior and distributed training communication patterns. Across representative training scenarios, it achieves over 90% network performance prediction accuracy while reducing testing costs by 10×, significantly accelerating network architecture design and validation cycles.

Technology Category

Application Category

📝 Abstract
This paper lays the foundation for Genie, a testing framework that captures the impact of real hardware network behavior on ML workload performance, without requiring expensive GPUs. Genie uses CPU-initiated traffic over a hardware testbed to emulate GPU to GPU communication, and adapts the ASTRA-sim simulator to model interaction between the network and the ML workload.
Problem

Research questions and friction points this paper is trying to address.

Testing network impact on ML performance without GPUs
Emulating GPU communication using CPU traffic
Modeling network-ML workload interaction via simulator
Innovation

Methods, ideas, or system contributions that make the work stand out.

CPU-initiated traffic emulates GPU communication
Hardware testbed for realistic network behavior
ASTRA-sim models network-workload interaction
🔎 Similar Papers
No similar papers found.