🤖 AI Summary
This work addresses the challenge of integrating multi-plane architectures into direct-connect interconnection networks like HyperX, which in current large-scale AI and HPC systems fail to simultaneously achieve low latency and cost efficiency. The paper presents the first integration of a multi-plane design into HyperX by constructing independent communication planes across one or more NIC ports, thereby optimizing the overall network topology. This approach substantially reduces network diameter and, under identical hardware costs, outperforms state-of-the-art topologies—including multi-plane Fat-Tree, Dragonfly, and Dragonfly+—in terms of bandwidth utilization and scalability. The proposed architecture overcomes the longstanding trade-off between performance and economic feasibility inherent in conventional interconnect designs.
📝 Abstract
Multi-plane architectures have become increasingly prevalent in the Fat-Tree networks of AI data centers. By leveraging multiple ports on a single network interface card (NIC) or multiple NICs within a scale-up domain, each port or NIC is allocated to an independent network plane, thereby provisioning the overall system with multiple network planes. However, no prior literature has explored the application of multi-plane technologies to direct networks such as HyperX. This paper investigates the multi-plane HyperX network and demonstrates that, compared to state-of-the-art network topologies like multi-plane Fat-Tree, Dragonfly, and Dragonfly+, the multi-plane HyperX architecture achieves a significantly smaller network diameter and superior cost-effectiveness.