Avoid Routing Polarization for OCS-based GPU Clusters

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue of routing polarization in optical circuit-switched (OCS) GPU clusters, where uneven inter-pod bandwidth allocation—stemming from circuit paths restricted to direct connections with spine switches—leads to leaf-spine link congestion and degraded machine learning training throughput. To mitigate this, the authors propose a leaf-switch-centric traffic scheduling paradigm and establish, for the first time, a theoretical sufficient condition that guarantees avoidance of routing polarization. They further design a polynomial-time algorithm for logical topology construction, replacing conventional mixed-integer programming approaches. Large-scale simulations demonstrate that, compared to baseline methods, their solution improves training throughput by up to 19.27% while reducing topology computation overhead by 99.16%.
📝 Abstract
Recent years have witnessed the growing deployment of optical circuit switches (OCS) in commercial GPU clusters (e.g., Google A3 GPU cluster) optimized for machine learning (ML) workloads. Such clusters adopt a three-tier leaf-spine-OCS topology, servers attach to leaf-layer electronic packet switches (EPSes); these leaf switches aggregate into spine-layer EPSes to form a Pod; and multiple Pods are interconnected via core-layer OCSes. Unlike EPSes, OCSes only support circuit-based paths between directly connected spine switches, potentially inducing a phenomenon termed routing polarization, which refers to the scenario where the bandwidth requirements between specific pairs of Pods are unevenly fulfilled through links among different spine switches. The resulting imbalance induces traffic contention and bottlenecks on specific leaf-to-spine links, ultimately reducing ML training throughput. To mitigate this issue, we introduce a leaf-centric paradigm to ensure traffic originating from the same leaf switch is evenly distributed across multiple spine switches with balanced loads. Through rigorous theoretical analysis, we establish a sufficient condition for avoiding routing polarization and propose a corresponding logical topology design algorithm with polynomial-time complexity. Large-scale simulations validate up to 19.27% throughput improvement and a 99.16% reduction in logical topology computation overhead compared to Mixed Integer Programming (MIP)-based methods.
Problem

Research questions and friction points this paper is trying to address.

routing polarization
optical circuit switch
GPU cluster
traffic imbalance
ML training throughput
Innovation

Methods, ideas, or system contributions that make the work stand out.

routing polarization
optical circuit switch
leaf-centric paradigm
logical topology design
GPU cluster
🔎 Similar Papers
No similar papers found.
X
Xinchi Han
Shanghai Jiao Tong University, Shanghai, China
W
Weihao Jiang
Shanghai Jiao Tong University, Shanghai, China
Y
Yingming Mao
Xi’an Jiao Tong University, Xi’an, China
Y
Yike Liu
Xi’an Jiao Tong University, Xi’an, China
Z
Zhuoran Liu
Shanghai Jiao Tong University, Shanghai, China
Y
Yongxi Lv
Shanghai Jiao Tong University, Shanghai, China
P
Peirui Cao
Nanjing University, Nanjing, China
Zhuotao Liu
Zhuotao Liu
Tsinghua University
Data/AI Privacy and SecurityDatacenter NetworkingSecure InternetBlockchain/Web3.0 Infra
X
Ximeng Liu
Shanghai Jiao Tong University, Shanghai, China
X
Xinbing Wang
Shanghai Jiao Tong University, Shanghai, China
C
Changbo Wu
University of Science and Technology of China, Hefei, China & Shanghai Innovation Institute, Shanghai, China
Zihan Zhu
Zihan Zhu
ETH Zurich
computer visioncomputer graphics
W
Wu Dongchao
Huawei, Dongguan, China
Y
Yang Jian
Huawei, Dongguan, China
Z
Zhang Zhanbang
Huawei, Dongguan, China
Y
Yuansen Chen
Huawei, Dongguan, China
Shizhen Zhao
Shizhen Zhao
Associate Professor, John Hopcroft Center, Shanghai Jiao Tong Univerisity
Hybrid Electrical/Optical Data Center NetworksDeterministic NetworksNetwork Optimization