🤖 AI Summary
This work uncovers a previously overlooked interference mechanism: high-bandwidth intra-node communication (e.g., PCIe/NVLink) under generative AI workloads implicitly degrades inter-node network performance—blindly increasing intra-node bandwidth exacerbates external traffic congestion, reducing cross-node throughput by up to 40%. To systematically investigate this, the authors develop a unified OMNeT++-based simulation model that faithfully reproduces canonical generative AI traffic patterns while integrating both intra- and inter-node topologies. Their analysis identifies critical performance inflection points and configuration boundaries, demonstrating the diminishing returns of unidimensional bandwidth scaling. The key contribution is a novel “co-optimization of intra- and inter-node networks” paradigm, which advocates joint scheduling of intra-node interconnects and inter-node networking resources. This framework provides foundational theoretical insights and practical guidance for communication architecture design in heterogeneous accelerator clusters.
📝 Abstract
Over the past decade, specialized computing and storage devices, such as GPUs, TPUs, and high-speed storage, have been increasingly integrated into server nodes within Supercomputers and Data Centers. The advent of high-bandwidth memory (HBM) has facilitated a more compact design for these components, enabling multiple units to be interconnected within a single server node through intra-node networks like PCIe, NVLink, or Ethernet. These networks allow for scaling up the number of dedicated computing and storage devices per node. Additionally, inter-node networks link these devices across thousands of server nodes in large-scale computing systems. However, as communication demands among accelerators grow-especially in workloads like generative AI-both intra- and inter-node networks risk becoming critical bottlenecks. Although modern intra-node network architectures attempt to mitigate this issue by boosting bandwidth, we demonstrate in this paper that such an approach can inadvertently degrade inter-node communication. This occurs when high-bandwidth intra-node traffic interferes with incoming traffic from external nodes, leading to congestion. To evaluate this phenomenon, we analyze the communication behavior of realistic traffic patterns commonly found in generative AI applications. Using OMNeT++, we developed a general simulation model that captures both intra- and inter-node network interactions. Through extensive simulations, our findings reveal that increasing intra-node bandwidth and the number of accelerators per node can actually hinder overall inter-node communication performance rather than improve it.