LumosCore: Highly Scalable LLM Clusters with Optical Interconnect

📅 2024-11-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the urgent demand for high-bandwidth, ultra-large-scale interconnects in large language model (LLM) training, this paper proposes a novel hybrid optical-electrical architecture dominated by optical circuit switching (OCS). The method introduces an innovative interleaved optical wiring deployment scheme and jointly optimizes GPU placement, logical topology generation, and OCS reconfiguration via a polynomial-time algorithm—ensuring hardware compatibility and low-conflict runtime scheduling. Experimental evaluation demonstrates up to 39.5% end-to-end training throughput improvement on a 128-node testbed; simulation at 16k nodes shows up to 34.1% reduction in average job completion time. By overcoming fundamental bottlenecks of conventional electrical packet switching, the architecture enables either a 2× bandwidth increase or an 8× scale-up in network size. This work establishes a scalable, high-efficiency optical interconnect infrastructure tailored for distributed LLM training.

Technology Category

Application Category

📝 Abstract
We propose emph{LumosCore} to build high-bandwidth and large-scale data center networks for LLM jobs. By replacing the core-layer electrical packet switches by optical circuit switches, emph{LumosCore} could achieves $2 imes$ increase in bandwidth or $8 imes$ increase in network size. We offer the detailed design of emph{LumosCore} at both deployment stage and running stage. At deployment stage, we propose Interleaved Wiring, which is compatible with all possible logical topologies. At running stage, we design polynomial-time algorithms for GPU placement, logical topology generating and OCS reconfiguration to minimize network contention and reduce impact to scheduled jobs. We evaluate emph{LumosCore} using both testbed experiments and large-scale simulation. Compared to traditional hybrid optical/electrical architectures, emph{LumosCore} increases the end-to-end training throughput by up to 39.5% on a 128-node testbed. Compared to the state-of-art Clos architectures, emph{LumosCore} reduces the average job completion time by up to 34.1% in a 16k simulation platform.
Problem

Research questions and friction points this paper is trying to address.

Build high-bandwidth data center networks for LLM jobs.
Replace electrical switches with optical circuit switches.
Minimize network contention and improve job completion time.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optical circuit switches replace electrical packet switches
Interleaved Wiring for flexible logical topologies
Polynomial-time algorithms optimize GPU placement and OCS reconfiguration
🔎 Similar Papers
No similar papers found.
X
Xinchi Han
Shanghai Jiao Tong University
Shizhen Zhao
Shizhen Zhao
Associate Professor, John Hopcroft Center, Shanghai Jiao Tong Univerisity
Hybrid Electrical/Optical Data Center NetworksDeterministic NetworksNetwork Optimization
Y
Yongxi Lv
Shanghai Jiao Tong University
P
Peirui Cao
Shanghai Jiao Tong University
W
Weihao Jiang
Shanghai Jiao Tong University
Shengkai Lin
Shengkai Lin
Shanghai Jiao Tong University
Network SystemNetwork virtualizationRDMA
X
Xinbing Wang
Shanghai Jiao Tong University